Grab image links from HTML website using powershel

2019-02-07 09:36发布

I'd like to download some image galleries in bulk. The images are offered up for free with no permissions needed. I for the life of me cannot get it to work. This is what I have so far. The $pattern spit out is the whole HTML line, not just the image link. Is there any pointers you can give me? The loop is set to only run once for testing purposes. The loop, will go through all pages which are organized numerically.

# Variables
$i=1        # Webpage Counter
$j=1        # Image Counter
$rootDir = "http://website.com/sport/galleries/"
$saveDir = "C:\Users\user\Desktop\"
$webpagetxt = "C:\Users\user\Desktop\page.txt"
$links = "C:\Users\user\Desktop\links.txt"
$regex = "http://website.com/galleries/[0-9]*/[^\.]*.JPG"

# Create folder to download to
#New-Item -Name SiouxSportsGalleries -ItemType directory

# Start Web Client
$client = New-Object System.Net.WebClient

# Main loop to get image links and download
    For($i=10; $i -le 10; $i++){

        # Download source code of the web page.
        $url = $rootDir+$i+'.htm'
        $webclient = new-object System.Net.WebClient
        $webpage = $webclient.DownloadString($url)
        $webpage > "$webpagetxt"

    # Parse web page and find image link.
       $pattern = Get-Content $webpagetxt | Select-String -pattern $regex -Allmatches
       echo "This is the link" $pattern
    #$pattern > $links

 }

2条回答
做个烂人
2楼-- · 2019-02-07 09:50

You need to extract value that was a match. Select-String returns objects, and when you echo it, what happends is $pattern.ToString(). ToString() returns the line, and not the match-value. This will return all the links only:

Get-Content $webpagetxt | Select-String -pattern $regex -Allmatches | % { $_.Matches | % { $_.Value } }

Btw, instead of saving the webpage and reopen it with get-content, you can simply split the string on linebreaks to get an array(if that's was the only reason you saved it). :-)

$webpage -split "`n" | Select-String -pattern $regex -Allmatches | % { $_.Matches | % { $_.Value } }

EDIT To download it, you could just extend it with another foreach-loop:

$rootDir = "http://website.com/sport/galleries/"
$saveDir = "C:\Users\user\Desktop\"
$webpage -split "`n" | Select-String -pattern $regex -Allmatches | % { $_.Matches | % { $_.Value } } | % {
    #Get local path
    $local = $_.Replace($rootDir, $saveDir)
    #Create path
    $file = New-Item $local -ItemType file -Force
    #Download
    $wb.DownloadFile($_, $file.FullName)
}
查看更多
劳资没心,怎么记你
3楼-- · 2019-02-07 09:53

Select-String returns you an object with properties. Send it to Get-Member to see what goodies you have. You'll want to check out the matches property e.g. $pattern.matches. Check out example 9 in the documentation.

查看更多
登录 后发表回答