Parse HTML via XPath

2019-01-16 10:08发布

In .Net, I found this great library, HtmlAgilityPack that allows you to easily parse non-well-formed HTML using XPath. I've used this for a couple years in my .Net sites, but I've had to settle for more painful libraries for my Python, Ruby and other projects. Is anyone aware of similar libraries for other languages?

8条回答
倾城 Initia
2楼-- · 2019-01-16 11:12

BeautifulSoup is a good Python library for dealing with messy HTML in clean ways.

查看更多
叛逆
3楼-- · 2019-01-16 11:12

For Ruby, I highly recommend Hpricot that Jb Evain pointed out. If you're looking for a faster libxml-based competitor, Nokogiri (see http://tenderlovemaking.com/2008/10/30/nokogiri-is-released/) is pretty good too (it supports both XPath and CSS searches like Hpricot but is faster). There's a basic wiki and some benchmarks.

查看更多
登录 后发表回答