Wget does not fetch google search results

2019-04-30 08:07发布

问题:

I noticed when running wget https://www.google.com/webhp?sourceid=chrome-instant&ion=1&espv=2&ie=UTF-8#q=foo and similar queries, I don't get the search results, but the google homepage.

There seems to be some redirect within the google page. Does anyone know a fix to wget so it would work?

回答1:

You can use this curl commands to pull Google query results:

curl -sA "Chrome" -L 'http://www.google.com/search?hl=en&q=time' -o search.html

For using https URL:

curl -k -sA "Chrome" -L 'https://www.google.com/search?hl=en&q=time' -o ssearch.html

-A option sets a custom user-agent Chrome in request to Google.



回答2:

#q=foo is your hint, as that's a fragment ID, which never gets sent to the server. I'm guessing you just took this URL from your browser URL-bar when using the live-search function. Since it is implemented with a lot of client-side magic, you cannot rely on it to work; try using Google with live search disabled instead. A URL pattern that seems to work looks like this: http://www.google.com/search?hl=en&q=foo.

However, I do notice that Google returns 403 Forbidden when called naïvely with wget, indicating that they don't want that. You can easily get past it by setting some other user-agent string, but do consider all the implications before doing so on a regular basis.



标签: bash wget