Batch searching on google : 403 error

2019-09-15 00:37发布

I am trying to do batch searching and go over a list of strings and print the first address that google search returns:

#!/usr/bin/python
import json
import urllib
import time
import pandas as pd

df = pd.read_csv("test.csv")
saved_column = df.Name #you can also use df['column_name']

for name in saved_column:
  query = urllib.urlencode({'q': name})
  url = 'http://ajax.googleapis.com/ajax/services/search/web?v=1.0&%s' % query
  search_response = urllib.urlopen(url)
  search_results = search_response.read()
  results = json.loads(search_results)
  data = results['responseData']

  address = data[u'results'][0][u'url']

  print address

I get a 403 error from the server: 'Suspected Terms of Service Abuse. Please see http://code.google.com/apis/errors', u'responseStatus': 403

Is what I'm doing is not allowed according to google's terms of service?

I also tried to put time.sleep(5) in the loop but I get the same error.

Thank you in advance

2条回答
萌系小妹纸
2楼-- · 2019-09-15 00:59

Not allowed by Google TOS. You really can't scrape google without them getting angry. It's also a pretty sophisticated blocker, so you can get around for a little while with random delays, but it fails pretty quickly.

Sorry, you're out of luck on this one.

查看更多
放我归山
3楼-- · 2019-09-15 01:02

https://developers.google.com/errors/?csw=1

The Google Search and Language APIs shown to the right have been officially deprecated.

Also

We received automated requests, such as scraping and prefetching. Automated requests are prohibited; all requests must be made as a result of an end-user action.

查看更多
登录 后发表回答