Use PowerShell to automate website login and file

2019-09-07 04:41发布

问题:

I want to use PowerShell to automate logging in to a website and downloading a PDF file. There are loads of examples on the internet that show how to do this (using Invoke-WebRequest, WebClient, HttpWebRequest, or InternetExplorer.Application), but most don't require a login first. Some show it with login, but I can't get them to work. I'm close with using InternetExplorer.Application:

$username = "xxxxx"
$password = "yyyyy"
$url = "https://example.com/login.aspx"
$usernameElementId = "aaaaa"
$passwordElementId = "bbbbb"
$submitButtonElementId = "ccccc"

$ie = New-Object -com InternetExplorer.Application
$ie.Visible = $true
$ie.Navigate($url)

while($ie.ReadyState -ne 4 -or $ie.Busy) {Start-Sleep -m 100}

$ie.Document.getElementById($usernameControlId).value = $username
$ie.Document.getElementById($passwordControlId).value = $password
$ie.Document.getElementById($submitButtonElementId).click()

while($ie.ReadyState -ne 4 -or $ie.Busy) {Start-Sleep -m 100}
Start-Sleep -m 2000

$url = "https://example.com/statements/201607.pdf"
$outFilePath = "C:\Downloads\Statement_201607.pdf"
$ie.Navigate($url)

while($ie.ReadyState -ne 4 -or $ie.Busy) {Start-Sleep -m 100}

# Script works up to this point--the pdf document is shown in IE.
#The file downloaded in the next step is empty.

$ie.Document.body | Out-File -FilePath $outFilePath

My question: how do I get the PDF document downloaded in the last step of the script?

I've tried to do this same task with WebClient and Invoke-WebRequest, but I keep getting errors, because of the authentication piece. I've tried capturing the cookies after login and passing them with the next request, but nothing. If someone has a working example of doing this using another means, I'm all ears. In fact my preference would be to avoid automating IE, if possible, but I'll take any working solution.

回答1:

Ideally you would be able to use Invoke-WebRequest as you have said, however this really depends on how the website is set up. If it's just querying a database for the login and generating a cookie from that, it's likely not possible (but still worth a shot):

$url = "https://example.com/statements/201607.pdf"
$outFilePath = "C:\Downloads\Statement_201607.pdf"

# Prompt for password
Invoke-WebRequest -Uri $url -Credential MyUser -OutFile $outFilePath
# MyUser can be substituted with a credential object but it's complex, Google it

Heck, try it without the Credential parameter at all, again depending on the site it might be publicly available (just not accessible).

Depending on the site they may have some APIs to download it, contact them at your discretion:

$proxy = New-WebServiceProxy -Uri "https://example.com/webservices.asmx" -Credential MyUser
# Again MyUser can be substituted with a credential object
$proxy.GetMyStatement("201607") | Out-File $outFilePath
# Name and syntax depend on how it is designed and may vary wildly from example

And as a last resort...

#Wait for Download Dialog box to pop up
Sleep 5
while($ie.Busy){Sleep 1} 
#------------------------------
#Hit "S" on the keyboard to hit the "Save" button on the download box
$obj = new-object -com WScript.Shell
$obj.AppActivate('Internet Explorer')
$obj.SendKeys('s')

#Hit "Enter" to save the file
$obj.SendKeys('{Enter}')

#Closes IE Downloads window
$obj.SendKeys('{TAB}')
$obj.SendKeys('{TAB}')
$obj.SendKeys('{TAB}')
$obj.SendKeys('{Enter}')

Note you will need to disable any in-browser PDF viewers so that it treats it as a standard download, in IE11 this can be tricky as it's managed by the PDF viewers. If you're using Adobe Reader seems you need to uninstall the BrowserIntegration feature. Basically when you manually click on it, you want to get the "Run or Save?" option.