Picking file names out of a website to download in

2019-03-06 07:33发布

Problem: I'm working on making a PowerShell script that will download the sites source code, find all the file targets, and then download said targets. I'm alright for authentication for the moment, so on my test website, I enabled anonymous authentication, enabled directory browsing, and disabled all other default pages, so all I get is a list of files on my site. What I have so far is this:

$source = "http://testsite/testfolder/"
$webclient = New-Object system.net.webclient
$destination = "c:/users/administrator/desktop/test/"
$webclient.downloadstring($source)

The $webclient.downloadstring will return basically the source code of my site, and I can see the files I want wrapped in the rest of the code. My question to you guys is what is the best and/or easiest ways of isolating the links I want so I can do a foreach command to download all of them?

Also, for extra credit, how would I go about adding in code to download folders and the files within those folders from my site? I can at least make seperate scripts to pull the files from each subfolder, but obviously it would be much nicer to get it all in one script.

1条回答
孤傲高冷的网名
2楼-- · 2019-03-06 08:33

If you are on PowerShell v3 the Invoke-WebRequest cmdlet may be of help.

To get an object representing the website:

Invoke-WebRequest "http://stackoverflow.com/search?tab=newest&q=powershell"

To get all the links in that website:

Invoke-WebRequest "http://stackoverflow.com/search?tab=newest&q=powershell" | select -ExpandProperty Links

And to just get a list of the href elements:

Invoke-WebRequest "http://stackoverflow.com/search?tab=newest&q=powershell" | select -ExpandProperty Links | select href

If you are on PowerShell v2 or earlier you'll have to create an InternetExplorer.Application COM object and use that to navigate the page:

$ie = new-object -com "InternetExplorer.Application"
# sleep for a second while IE launches
Start-Sleep -Seconds 1
$ie.Navigate("http://stackoverflow.com/search?tab=newest&q=powershell")
# sleep for a second while IE opens the page
Start-Sleep -Seconds 1
$ie.Document.Links | select IHTMLAnchorElement_href
# quit IE
$ie.Application.Quit()

Thanks to this blog post where I learnt about Invoke-WebRequest.

Update: One could also download the website source like you posted and then extract the links from the source. Something like this:

$webclient.downloadstring($source) -split "<a\s+" | %{ [void]($_ -match "^href=[`'`"]([^`'`">\s]*)"); $matches[1] }

The -split part splits the source along lines that start with <a followed by one or more spaces. The output is placed in an array which I then pipe through a foreach-object block. Here I match each line on the regexp which extracts the links part and outputs it.

If you want to do more with the output you can pipe it further through another block which does something with it.

查看更多
登录 后发表回答