Get title, content via link in rails

I just started learning rails. Could you help me understand parsing a single link? Good tutorial will help too...

The question:

When you submit a link in Digg, Facebook etc.. After you say attach link it parses the link to fetch the title, content, images of a particular url. Could you please help me how a similar thing can be implemented in rails?

I have looked at feed parsers like feedzirra etc but they seem to get the complete website feed.. Not just the link we are looking for.. Or is it that I am making a mistake somewhere?

Thanks so much in advance.

标签： ruby-on-rails ruby ruby-on-rails-3 parsing web-scraping

3条回答

放我归山

2楼-- · 2019-02-03 20:45

ootoovak's answer is correct but I prefer using mechanize its an alternative. Using mechanize this would work for you:

agent=Mechanize.new # Creates a new Mechanize Object
agent.get("http://domain.de/page.html") # This fetches the page given as parameter
agent.page.title # This will return the title of the page

To install mechanize simply add gem 'mechanize' to your Gemfile and run bundle install.

0人赞添加讨论(0) 举报

beautiful°

3楼-- · 2019-02-03 20:52

Looks like you might be looking for something like Pismo: https://github.com/peterc/pismo

require 'pismo'

# Load a Web page (you could pass an IO object or a string with existing HTML data along, as you prefer)
doc = Pismo::Document.new('http://www.rubyinside.com/cramp-asychronous-event-driven-ruby-web-app-framework-2928.html')

doc.title     # => "Cramp: Asychronous Event-Driven Ruby Web App Framework"
doc.author    # => "Peter Cooper"
doc.lede      # => "Cramp (GitHub repo) is a new, asynchronous evented Web app framework by Pratik Naik of 37signals (and the Rails core team). It's built around Ruby's EventMachine library and was designed to use event-driven I/O throughout - making it ideal for situations where you need to handle a large number of open connections (such as Comet systems or streaming APIs.)"
doc.keywords  # => [["cramp", 7], ["controllers", 3], ["app", 3], ["basic", 2], ..., ... ]

An image caveat is:

The image extraction only deals with images with absolute URLs

0人赞添加讨论(0) 举报

倾城　Initia

4楼-- · 2019-02-03 21:00

> Mechanize.new.get('http://google.com').title
=> "Google"

Make sure you require 'mechanize' or added gem 'mechanize' to your Gemfile.

0人赞添加讨论(0) 举报

Get title, content via link in rails

采纳回答

编辑标签

举报内容

检举类型

检举原因

检举说明(必填)

打开微信“扫一扫”，打开网页后点击屏幕右上角分享按钮

付费偷看金额在0.1-10元之间