How to parse og meta tags using httparty for rails

I am trying to parse og meta tags using the HTTParty gem using this code:

link = http://www.usatoday.com/story/gameon/2013/01/08/nfl-jets-tony-sparano-fired/1817037/
# link = http://news.yahoo.com/chicago-lottery-winners-death-ruled-homicide-181627271.html
resp = HTTParty.get(link)
ret_body = resp.body

# title
  og_title = ret_body.match(/\<[Mm][Ee][Tt][Aa] property\=\"og:title\"\ content\=\"(.*?)\"\/\>/)
  og_title = og_title[1].to_s

The problem is that it worked on some sites (yahoo!) but not others (usa today)

标签： ruby-on-rails ruby opengraph httparty

3条回答

做个烂人

2楼-- · 2019-05-30 14:49

Perhaps I can offer an easier solution? Check out the OpenGraph gem.

It's a simple library for parsing Open Graph protocol information from web sites and should solve your problem.

0人赞添加讨论(0) 举报

我想做一个坏孩纸

3楼-- · 2019-05-30 15:02

Don't parse HTML with regular expressions, because they're too fragile for anything but the simplest problems. A tiny change to the HTML can break the pattern, causing you to begin a slow battle of maintaining an ever expanding pattern. It's a war you won't win.

Instead, use a HTML parser. Ruby has Nokogiri, which is excellent. Here's how I'd do what you want:

require 'nokogiri'
require 'httparty'

%w[
  http://www.usatoday.com/story/gameon/2013/01/08/nfl-jets-tony-sparano-fired/1817037/
  http://news.yahoo.com/chicago-lottery-winners-death-ruled-homicide-181627271.html
].each do |link|
  resp = HTTParty.get(link)

  doc = Nokogiri::HTML(resp.body)
  puts doc.at('meta[property="og:title"]')['content']
end

Which outputs:

Jets fire offensive coordinator Tony Sparano
Chicago lottery winner's death ruled a homicide

0人赞添加讨论(0) 举报

Juvenile、少年°

4楼-- · 2019-05-30 15:02

Solution:

og_title = ret_body.match(/\<[Mm][Ee][Tt][Aa] property\=\"og:title\"\ content\=\"(.*?)\"[\s\/\>|\/\>]/)
og_title = og_title[1].to_s

Trailing whitespace messed up the parsing so make sure to check for that. I added an OR clause to the regex to allow for both trailing and non trailing whitespace.

0人赞添加讨论(0) 举报

How to parse og meta tags using httparty for rails

采纳回答

编辑标签

举报内容

检举类型

检举原因

检举说明(必填)

打开微信“扫一扫”，打开网页后点击屏幕右上角分享按钮

付费偷看金额在0.1-10元之间