Getting links from webpage in Powershell using reg

2019-08-26 09:52发布

问题:

Below is my code in powershell to fetch the links in a webpage. Intermittently, I get "Cannot index into null array" exception. Is there anything wrong in this code. Help required.

$Download = $wc.DownloadString($Link) 
$List = $Download -split "<a\s+" | %{ [void]($_ -match "^href=[`'`"]([^`'`">\s]*)"); $matches[1] }

回答1:

You don't need to parse anything yourself (and as was pointed out in the comments, you can't parse HTML with a regex in the first place). Use Invoke-Webrequest to fetch the page; one of the properties of the object it returns is a collection of all the links on the page, already parsed out for you.

Example:

$Link = "https://stackoverflow.com/questions/49418802/getting-links-from-webpage-in-powershell-using-regular-expression";
Invoke-WebRequest -Uri $Link | Select-Object -ExpandProperty links;

Or, if you need just the URLs, you can do it a bit more concisely:

$Link = "https://stackoverflow.com/questions/49418802/getting-links-from-webpage-in-powershell-using-regular-expression";
(Invoke-WebRequest -Uri $Link).links.href;