invoke-webrequest pro tips

The Invoke-WebRequest PowerShell commandlet is great if you want to get and work with some web page’s output without installing any extra tools like wget.exe for example. If you’re planning to do some text parsing on a web page anyway, PowerShell is an excellent option, so why not go full PS mode?
Unfortunately the command has some drawbacks causing it to be a lot slower than it should be if you just want plain text and it’s response parsing can even cause it to lock up and not return a result at all.

So here’s some pro-tips for parsing the output using PowerShell fast and effectively:

1. Use basic parsing

The commandlet does some DOM parsing by default using Internet Explorer. This takes time and sometimes fails too, so if you want to skip this bit and make things faster, simply add the command-line switch UseBasicParsing:

$r = Invoke-WebRequest -UseBasicParsing

2. Split html in lines

Parsing text in PS is easy, but it’s even easier if the result is formatted like a text file with multiple lines instead of the full HTML in a single string. If you get the Content property from your webpage, you can split it up into separate lines by splitting on the newline character:

(Invoke-WebRequest -UseBasicParsing).Content -split "`n"

Or, if you also want the HTTP header info to be included in the result, use RawContent instead:

(Invoke-WebRequest -UseBasicParsing).RawContent -split "`n"

This can be really handy if you want to automatically check if the right response headers are set.
But you can also use the Headers collection on the result object, which is even easier.

3. Disable download progress meter shizzle to download large files (or always to speed things up)

That download progress bar is a nice visual and all when you’re using Invoke-WebRequest to download some large binaries and want to see it’s progress, but it significantly slows things down too. Set the $progressPreference variable and you’ll see your scripts download those files a lot faster.
The larger the files (like big as log files, images, video’s etc) the more this matters I’ve noticed.

$progressPreference = 'silentlyContinue'
invoke-webrequest $logurl -outfile .\logfile.log -UseBasicParsing
$progressPreference = 'Continue'

Be sure to reactivate this setting afterwards, because this affects any commandlet using that progress-bar feature.

4. No redirects please.

Invoke-WebRequest automatically follows an HTTP redirect (301/302) so you end up with the page you where looking for in the most cases.
If you want to test if a URL is properly redirected (or not redirected) this just makes things harder. In that case you can turn off redirects by using the MaximumRedirection parameter and setting it to 0

When you get a URL that returns a 301 when doing this, the command will throw an exception saying the maximum redirection counts has been exceeded. This makes this case easier to test.
The result object will also contain the redirect StatusCode.

$r = Invoke-WebRequest -MaximumRedirection 0

5. Use the PowerShell result object

It’s overkill in some cases, but in others this is pure win. The result object contains some really handy bits of the webpage, making a lot of tricky text and regex parsing obsolete.
It’s a piece of cake to parse all images linked from a page using the Image collection. Want to parse all outgoing links on a page? Use the Links collection. There’s also a StatusCode, a Headers collection a Forms and Inputfield collection for form parsing and more.
Check out what’s available using Get-Members:

Invoke-WebRequest | get-members

4. If all else fails, use wget.exe

Yep. Sometimes Invoke-WebRequest simply doesn’t cut it. I’ve seen it hang on some complex pages trying to parse it and failing miserably.
In that case you get fetch the page using the GNU WGet tool, download the page as a text file and then parse that.
You have to call wget by adding the exe extension part otherwise you’ll be triggering the PowerShell alias for Invoke-WebRequest again.

# Install WGet with Chocolatey
choco install wget

# Get the page and save it as a text file
wget.exe -O nj.html
# Read the file and parse it.
get-content nj.html | % { # parsing code goes here }

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.