I know that Hpricot is still a standard but I remember hearing about a faster more expressive HTML parser for Ruby.
Does anybody know what it's called and if it is worth switching to from Hpricot??
Thanks in advance
I know that Hpricot is still a standard but I remember hearing about a faster more expressive HTML parser for Ruby.
Does anybody know what it's called and if it is worth switching to from Hpricot??
Thanks in advance
There are multiple tools available. I use Nokogiri.
Demo:
Ryan Bates made an excelent screencast about using it: #190: Screen Scraping with Nokogiri.
Documentation: http://nokogiri.org/
Tutorials: http://nokogiri.org/tutorials
There is also Rubyful Soup
Which sells itself as a lightweight quick and dirty parser. I found the interface very intuitive and 'Ruby-ish' when using it for a project in the past, which is perhaps a little surprising given that it is a Python port.
Edit: looks like it's no longer maintained unfortunately so it's probably not the one you were looking for. Looks like Nokogiri is the on you've been hearing about.
Don't use regular expressions -- ruby's regex stuff is way too slow. Hpricot is awesome and Nokogiri looks promising, though I've not used it directly yet.
You are probably thinking about Nokogiri. I have not used it myself, but "everyone" is talking about it and the benchmarks do look interesting: