Library to query HTML with XPath in Java?

2019-01-15 04:42发布

Can anyone recommend me a java library to allow me XPath Queries over URLs? I've tried JAXP without success.

Thank you.

4条回答
干净又极端
2楼-- · 2019-01-15 04:56

I've used JTidy to make HTML into a proper DOM, then used plain XPath to query the DOM.

If you want to do cross-document/cross-URL queries, better use JTidy with XQuery.

查看更多
再贱就再见
3楼-- · 2019-01-15 05:02

There are several different approaches to this documented on the Web:

Using HtmlCleaner

Using Jericho

I have tried a few different variations of these approaches, i.e. HtmlParser plus the Java DOM parser, and JSoup plus Jaxen, but the combination that worked best is HtmlCleaner plus the Java DOM parser. The next best combination was Jericho plus Jaxen.

查看更多
叛逆
4楼-- · 2019-01-15 05:08

You could use TagSoup together with Saxon. That way you simply replace any XML SAX parser used with TagSoup and the XPath 2.0 or XSLT 2.0 or XQuery 1.0 implementation works as usual.

查看更多
等我变得足够好
5楼-- · 2019-01-15 05:16

jsoup, Java HTML Parser Very similar to jQuery syntax way.

查看更多
登录 后发表回答