How to get the raw HTML source code for a page by

2019-04-12 03:42发布

I'm using Nokogiri (Ruby Xpath library) to grep contents on web pages. Then I found problems with some web pages, such as Ajax web pages, and that means when I view source code I won't be seeing the exact contents such as <table>, etc.

How can I get the HTML code for the actual content?

标签： ruby nokogiri raw-data

1条回答

爱情/是我丢掉的垃圾

2楼-- · 2019-04-12 03:55

Don't use Nokogiri at all if you want the raw source of a web page. Just fetch the web page directly as a string, and then do not feed that to Nokogiri. For example:

require 'open-uri'
html = open('http://phrogz.net').read
puts html.length #=> 8461
puts html        #=> ...raw source of the page...

If, on the other hand, you want the post-JavaScript-modified contents of a page (such as an AJAX library that executes JavaScript code to fetch new content and change the page), then you can't use Nokogiri. You need to use Ruby to control a web browser (e.g. read up on Selenium or Watir).

0人赞添加讨论(0) 举报

How to get the raw HTML source code for a page by

采纳回答

编辑标签

举报内容

检举类型

检举原因

检举说明(必填)

打开微信“扫一扫”，打开网页后点击屏幕右上角分享按钮

付费偷看金额在0.1-10元之间