Problem: I'm working on making a PowerShell script that will download the sites source code, find all the file targets, and then download said targets. I'm alright for authentication for the moment, so on my test website, I enabled anonymous authentication, enabled directory browsing, and disabled all other default pages, so all I get is a list of files on my site. What I have so far is this:
$source = "http://testsite/testfolder/"
$webclient = New-Object system.net.webclient
$destination = "c:/users/administrator/desktop/test/"
$webclient.downloadstring($source)
The $webclient.downloadstring
will return basically the source code of my site, and I can see the files I want wrapped in the rest of the code. My question to you guys is what is the best and/or easiest ways of isolating the links I want so I can do a foreach command to download all of them?
Also, for extra credit, how would I go about adding in code to download folders and the files within those folders from my site? I can at least make seperate scripts to pull the files from each subfolder, but obviously it would be much nicer to get it all in one script.
If you are on PowerShell v3 the
Invoke-WebRequest
cmdlet may be of help.To get an object representing the website:
To get all the links in that website:
And to just get a list of the
href
elements:If you are on PowerShell v2 or earlier you'll have to create an
InternetExplorer.Application
COM object and use that to navigate the page:Thanks to this blog post where I learnt about
Invoke-WebRequest
.Update: One could also download the website source like you posted and then extract the links from the source. Something like this:
The
-split
part splits the source along lines that start with<a
followed by one or more spaces. The output is placed in an array which I then pipe through aforeach-object
block. Here I match each line on the regexp which extracts the links part and outputs it.If you want to do more with the output you can pipe it further through another block which does something with it.