GNU Wget is a powerful tool when it comes to downloading files from the web or mirroring sites. It’s command line features can be daunting and not very obvious. With some experimentation, reading the (f..) manual and some Googling you can get it to do some pretty neat tricks for you.
All of that is from the command line too, which is great if you want to schedule this kind of magic or use it in a script.
For example, you might want to warm-up your site or WordPress blog so your homepage and all posts linked from it are present in your cache when a visitor arrives. I’m assuming you are using a caching on your site otherwise this is pretty pointless. For WordPress you can use a caching plugin like W3 Total Cache for example.
With Wget, it goes like this:
wget.exe http://n3wjack.net --spider --no-directories --level=1 --recursive
The command line parameters (in order) mean something like:
- Crawl n3wjack.net.
- Crawl it like a spider (follow the links).
- Don’t create directories for downloads.
- Crawl 1 level deep (so anything linked on the homepage is OK, but don’t go deeper).
- Do this recursively (so it actually goes 1 level deep).
- Follow only links that start with
"/209..." (it’s a regular expression).
This one is a trick to have it only follow links to blog-posts because my URL scheme begins with the year of the post (2015, 2016, …). It’s good until 2099, which should do the trick I guess. :)
This way I’m also avoiding it loading all tag, category or page links.
If your site has a different URL scheme you’ll have to change the accept regex pattern to fit your scheme.
You can download Wget from the GNU site. It’s Open Source and is available for Windows, Mac and various Unix systems.
For Chocolatey users, there is a wget package available to install it on your system.
A while ago I noticed that some of my older posts had some silly misspellings in it, so I was looking for a way to spell check all my posts in one shot. I couldn’t really find anything that was free, so I figured I’d try to write something myself to do this for me.
I knew about the free and open source Hunspell spell checker and that you can use it from the command line. So I thought using that together with the WordPress export XML file which has all your post’s content it should be possible to spell check the whole lot.
The end result is a PowerShell script which reads out the XML export file and runs it through Hunspell, parses the spelling errors found and finally bundling it all into a simple HTML report.
It worked nicely for me, even though it’s pretty crude and simple. I only had to use this once, so I don’t see the point of fine-tuning it a lot further.
However this could be handy for others who want to do the same thing, so I cleaned it up a bit, slapped a readme file on it and posted it on Github as the WordPress full site spell checker.
Check it out if you want to spell check your WordPress blog in a single run and maybe this will be good enough to get your job done. You find more info on how to set up and use it on the Github page.
I use a number of analytics tools to see how little hits I get a month and one of the things that annoyed me is that my own visits as I’m writing posts or looking up older posts also get counted. There’s a silly trick to avoid this and it’s so easy it’s stupid I didn’t think of it before.
To exclude yourself from those stats all you need to do is make sure that code doesn’t get included when you are browsing your own site. Here’s how it works.
- Put your web analytics script code in a sidebar text widget. Leave the title empty if you don’t want anything to show up.
- Click the “Visibility” button at the bottom of the widget panel.
- In the options, choose “Hide” if: “User” is “Logged in”.