Scraping/Parsing Google search results in Ruby

Assume I have the entire HTML of a Google search results page. Does anyone know of any existing code (Ruby?) to scrape/parse the first page of Google search results? Ideally it would handle the Shopping Results and Video Results sections that can spring up anywhere.

If not, what's the best Ruby-based tool for screenscraping in general?

To clarify: I'm aware that it's difficult/impossible to get Google search results programmatically/API-wise AND simply CURLing results pages has a lot of issues. There's concensus on both of these points here on stackoverflow. My question is different.

标签： ruby google-search google-search-api

6条回答

我只想做你的唯一

2楼-- · 2019-02-07 09:36

You should be able to accomplish your goal easily with Mechanize.

Edit: Actually, if you already have the results, all you need is HPricot or Nokogiri.

0人赞添加讨论(0) 举报

Juvenile、少年°

3楼-- · 2019-02-07 09:38

This should be very simple thing, have a look at Screen Scraping with ScrAPI screen cast by Ryan Bates. You still can do without scraping libs, just stick to simple things like nokogiri.

Update:

From nokogiri's documentation:

  require 'nokogiri'
  require 'open-uri'

  # Get a Nokogiri::HTML:Document for the page we’re interested in...

  doc = Nokogiri::HTML(open('http://www.google.com/search?q=tenderlove'))

  # Do funky things with it using Nokogiri::XML::Node methods...

  ####
  # Search for nodes by css
  doc.css('h3.r a.l').each do |link|
    puts link.content
  end

  ####
  # Search for nodes by xpath
  doc.xpath('//h3/a[@class="l"]').each do |link|
    puts link.content
  end

  ####
  # Or mix and match.
  doc.search('h3.r a.l', '//h3/a[@class="l"]').each do |link|
    puts link.content
  end

0人赞添加讨论(0) 举报

Explosion°爆炸

4楼-- · 2019-02-07 09:41

I would suggest httparty + google ajax search api

0人赞添加讨论(0) 举报

狗以群分

5楼-- · 2019-02-07 09:46

I'm unclear as to why you want to be screen scraping in the first place. Perhaps the REST search API would be more appropriate? It will return the results in JSON format, which will be much easier to parse, and save on bandwidth. For example, if your search was 'foo bar', you could just send a GET request to http://ajax.googleapis.com/ajax/services/search/web?v=1.0&q=foo+bar and handle the response.

For more information, see this blog post or the official documentation.

0人赞添加讨论(0) 举报

别忘想泡老子

6楼-- · 2019-02-07 09:49

I don't know Ruby specific code but this google scraper could help you. That's an online tool demo that works scraping and parsing Google results. The most interesting thing is the article there with the explanation of the parsing process in PHP but it's applicable to Ruby and any other programming language.

0人赞添加讨论(0) 举报

趁早两清

7楼-- · 2019-02-07 09:54

Scrapping has became harder and harder as Google keep changing while expanding how the results are structured (Rich snippets, knowledge graph, direct answer, etc.), we built a service that handle part of this complexity and we do have a Ruby library. It's pretty straightforward to use:

query = GoogleSearchResults.new q: "coffee"

# Parsed Google results into a Ruby hash
hash_results = query.get_hash

0人赞添加讨论(0) 举报

Scraping/Parsing Google search results in Ruby

采纳回答

编辑标签

举报内容

检举类型

检举原因

检举说明(必填)

打开微信“扫一扫”，打开网页后点击屏幕右上角分享按钮

付费偷看金额在0.1-10元之间