HTML Parser into DOM in Ruby

2019-07-04 07:36发布

问题:

Is there any HTML parser in Ruby that reads HTML document into a DOM Tree and represents HTML tags as DOM elements?

I know Nokogiri but it doesn't parse HTML into DOM tree.

回答1:

Despite your remark, Nokogiri is the way to go:

doc = Nokogiri::HTML('<body><p>Hello, worlds!</body>')

It parses even invalid HTML and returns a DOM tree:

>> doc.class
=> Nokogiri::HTML::Document
>> doc.root.class
=> Nokogiri::XML::Element
>> doc.root.children.class
=> Nokogiri::XML::NodeSet
>> doc.root.children.first.content
=> "Hello, worlds!"