Picking file names out of a website to download in

Problem: I'm working on making a PowerShell script that will download the sites source code, find all the file targets, and then download said targets. I'm alright for authentication for the moment, so on my test website, I enabled anonymous authentication, enabled directory browsing, and disabled all other default pages, so all I get is a list of files on my site. What I have so far is this:

$source = "http://testsite/testfolder/"
$webclient = New-Object system.net.webclient
$destination = "c:/users/administrator/desktop/test/"
$webclient.downloadstring($source)

The $webclient.downloadstring will return basically the source code of my site, and I can see the files I want wrapped in the rest of the code. My question to you guys is what is the best and/or easiest ways of isolating the links I want so I can do a foreach command to download all of them?

Also, for extra credit, how would I go about adding in code to download folders and the files within those folders from my site? I can at least make seperate scripts to pull the files from each subfolder, but obviously it would be much nicer to get it all in one script.

标签： windows command-line powershell-v2.0

1条回答

孤傲高冷的网名

2楼-- · 2019-03-06 08:33

If you are on PowerShell v3 the Invoke-WebRequest cmdlet may be of help.

To get an object representing the website:

Invoke-WebRequest "http://stackoverflow.com/search?tab=newest&q=powershell"

To get all the links in that website:

Invoke-WebRequest "http://stackoverflow.com/search?tab=newest&q=powershell" | select -ExpandProperty Links

And to just get a list of the href elements:

Invoke-WebRequest "http://stackoverflow.com/search?tab=newest&q=powershell" | select -ExpandProperty Links | select href

If you are on PowerShell v2 or earlier you'll have to create an InternetExplorer.Application COM object and use that to navigate the page:

$ie = new-object -com "InternetExplorer.Application"
# sleep for a second while IE launches
Start-Sleep -Seconds 1
$ie.Navigate("http://stackoverflow.com/search?tab=newest&q=powershell")
# sleep for a second while IE opens the page
Start-Sleep -Seconds 1
$ie.Document.Links | select IHTMLAnchorElement_href
# quit IE
$ie.Application.Quit()

Thanks to this blog post where I learnt about Invoke-WebRequest.

Update: One could also download the website source like you posted and then extract the links from the source. Something like this:

$webclient.downloadstring($source) -split "<a\s+" | %{ [void]($_ -match "^href=[`'`"]([^`'`">\s]*)"); $matches[1] }

The -split part splits the source along lines that start with <a followed by one or more spaces. The output is placed in an array which I then pipe through a foreach-object block. Here I match each line on the regexp which extracts the links part and outputs it.

If you want to do more with the output you can pipe it further through another block which does something with it.

0人赞添加讨论(0) 举报

Picking file names out of a website to download in

采纳回答

编辑标签

举报内容

检举类型

检举原因

检举说明(必填)

打开微信“扫一扫”，打开网页后点击屏幕右上角分享按钮

付费偷看金额在0.1-10元之间