-->

Search all of Google with Google Python API

2020-08-04 10:12发布

问题:

I will be using python. My plan is to make a program that searches a bunch of things, and sees how many search results google has for it. But I can only figure out how to get custom search engine to kind of work.

In python, how do I use the Google API to do a simple search using Google's main search engine? As I understand, the answer to this has changed within in the last few years as google has made a push to the google app engine.

回答1:

Recently I was also looking for Google Search API and was misguided by a lot of outdated information. Here is what I found on Google Developers website: https://developers.google.com/api-client-library/python/apis/customsearch/v1

According to the docs your function will be something like

from googleapiclient.discovery import build


def google_results_count(query):
    service = build("customsearch", "v1",
                    developerKey="[put your API key here]")

    result = service.cse().list(
            q=query,
            cx='[put your CSE key here']
        ).execute()

    return result["searchInformation"]["totalResults"]

print google_results_count('Python is awesome')

Unfortunately, using CSE API will give you different result count from the one you get using web search. In the example above I got 2 680 000 in Python and approx. 21 000 000 for the same query on Google.com Here is an explanation why: https://support.google.com/customsearch/answer/70392?hl=en

Getting the API and CSE keys and all the limitations of CSE is a whole different story, I highly recommend you looking at this answer: https://stackoverflow.com/a/11206266/1704272 and the next one below for the alternatives.

Another approach is to parse the HTML response from Google.com which will give you the most complete results but it is not very reliable because Google changes the HTML markup. And more important this is against their TOS, more to read here: Is it ok to scrape data from Google results?

My conclusion. You have three options:

  1. Use Google CSE API (free). Use this, if you need to stay legal and you are sure you won't exceed the limit. Can not be used in public application.
  2. Use paid API (Google or any other, less expensive). It is legal to use this for any public application but be ready to pay for that.
  3. Scrape Google web page. This will give you the best results but I would use this option only for private needs.