What is the correct way to get google search resul

2019-02-09 15:11发布

I want to get all the search results for a particular keyword search on google. I've seen suggestions of scraping, but this seems like a bad idea. I've seen Gems (I plan on using ruby) that do scraping and use the API. I've also seen suggestions of using the API.

Does anyone know the best way to do this right now? The API Is no longer supported and I've seen people report they get unusable data back. Do the Gems help solve this or no?

Thanks in advance.

6条回答
\"骚年 ilove
2楼-- · 2019-02-09 15:52

According to http://code.google.com/apis/websearch/ , the Search API has been deprecated -- but there's a replacement, the Custom Search API. Will that do what you want?

If so, a quick Web search turned up https://github.com/alexreisner/google_custom_search , among other gems.

查看更多
ら.Afraid
3楼-- · 2019-02-09 15:54

You will eventually get 503 errors if you are running a scraper on a google search result page. A more scalable (and legal) approach is to use the Google's Custom Search API.

The API provides 100 search queries per day for free. If you need more, you may sign up for billing in the Google Developers Console. Additional requests cost $5 per 1000 queries, up to 10k queries per day.

The example below get's Google search results in JSON format:

require 'open-uri'
require 'httparty'
require 'pp'

def get_google_search_results(search_phrase)
  # assign api key
  api_key = "Your api key here"

  # encode search phrase
  search_phrase_encoded = URI::encode(search_phrase)

  # get api response 
  response = HTTParty.get("https://www.googleapis.com/customsearch/v1?q=#{search_phrase_encoded}&key=#{api_key}&num=100")

  # pretty print api response
  pp response

  # get the url of the first search result
  first_search_result_link = response["items"][0]["link"]

end

get_google_search_results("Top Movies in Theatres")
查看更多
在下西门庆
4楼-- · 2019-02-09 16:07

The Custom Search API most likely is not what you're looking for. I'm pretty sure you have to set up a Custom Search engine which you use the API to query, and this can only search over a user-specified set of domains (i.e. you can't perform general web search).

If you need to perform a general Google search, then scraping is currently the only way to go. It's quite easy to write ruby code to perform Google searches and scrape the search results URLs (I did this myself for a summer research project), but it does violate Google's TOS, so be warned.

查看更多
爷的心禁止访问
5楼-- · 2019-02-09 16:09

I also go for the scrape option, its quicker than asking google for a key and plus, and you are not limited to 100 search queries per day. Google´s TOS is an issue though, as Richard points out. Here´s an example i´ve done that works for me - it´s also useful if you want to connect through a proxy:

require 'rubygems'
require 'mechanize'

agent = Mechanize.new
agent.set_proxy '78.186.178.153', 8080
page = agent.get('http://www.google.com/')

google_form = page.form('f')
google_form.q = 'new york city council'

page = agent.submit(google_form, google_form.buttons.first)

page.links.each do |link|
    if link.href.to_s =~/url.q/
        str=link.href.to_s
        strList=str.split(%r{=|&}) 
        url=strList[1] 
        puts url
    end 
end
查看更多
仙女界的扛把子
6楼-- · 2019-02-09 16:10
我欲成王,谁敢阻挡
7楼-- · 2019-02-09 16:15

You can also use our API. We take care of the hard parts of scrapping and parsing Google search results. We have bindings available in Ruby as simple as:

query = GoogleSearchResults.new q: "coffee"
hash_results = query.get_hash

Repository: https://github.com/serpapi/google-search-results-ruby

查看更多
登录 后发表回答