I want to use PowerShell to automate logging in to a website and downloading a PDF file. There are loads of examples on the internet that show how to do this (using Invoke-WebRequest
, WebClient
, HttpWebRequest
, or InternetExplorer.Application
), but most don't require a login first. Some show it with login, but I can't get them to work. I'm close with using InternetExplorer.Application
:
$username = "xxxxx"
$password = "yyyyy"
$url = "https://example.com/login.aspx"
$usernameElementId = "aaaaa"
$passwordElementId = "bbbbb"
$submitButtonElementId = "ccccc"
$ie = New-Object -com InternetExplorer.Application
$ie.Visible = $true
$ie.Navigate($url)
while($ie.ReadyState -ne 4 -or $ie.Busy) {Start-Sleep -m 100}
$ie.Document.getElementById($usernameControlId).value = $username
$ie.Document.getElementById($passwordControlId).value = $password
$ie.Document.getElementById($submitButtonElementId).click()
while($ie.ReadyState -ne 4 -or $ie.Busy) {Start-Sleep -m 100}
Start-Sleep -m 2000
$url = "https://example.com/statements/201607.pdf"
$outFilePath = "C:\Downloads\Statement_201607.pdf"
$ie.Navigate($url)
while($ie.ReadyState -ne 4 -or $ie.Busy) {Start-Sleep -m 100}
# Script works up to this point--the pdf document is shown in IE.
#The file downloaded in the next step is empty.
$ie.Document.body | Out-File -FilePath $outFilePath
My question: how do I get the PDF document downloaded in the last step of the script?
I've tried to do this same task with WebClient
and Invoke-WebRequest
, but I keep getting errors, because of the authentication piece. I've tried capturing the cookies after login and passing them with the next request, but nothing. If someone has a working example of doing this using another means, I'm all ears. In fact my preference would be to avoid automating IE, if possible, but I'll take any working solution.
Ideally you would be able to use
Invoke-WebRequest
as you have said, however this really depends on how the website is set up. If it's just querying a database for the login and generating a cookie from that, it's likely not possible (but still worth a shot):Heck, try it without the Credential parameter at all, again depending on the site it might be publicly available (just not accessible).
Depending on the site they may have some APIs to download it, contact them at your discretion:
And as a last resort...
Note you will need to disable any in-browser PDF viewers so that it treats it as a standard download, in IE11 this can be tricky as it's managed by the PDF viewers. If you're using Adobe Reader seems you need to uninstall the BrowserIntegration feature. Basically when you manually click on it, you want to get the "Run or Save?" option.