Converting HTML to RDF

2019-06-21 08:25发布

问题:

I'm looking for a general purpose API/web service/tool/etc... that allows convert a given HTML page to an RDF graph as specific as possible (most probably using a back bone ontology and/or mapper).

回答1:

Have you proved GRDDL?

GRDDL is a technique for obtaining RDF data from XML documents and in particular XHTML pages.



回答2:

I used XQuery to extract the data out of the given set of web pages. I had to write custom queries for the web pages. I think this is the most straight forward approach to take for a specific set of HTML files. However, it is obviously not good for the general case. For a different set of web pages other custom queries are need to be written.



回答3:

I used JSoup to scrape data from HTML. It uses jQuery style of querying HTML DOM, wich I was already famirial with, so it was realy simple tool to use for me. I also fund it quite robust but I needed it just to scrape 3 datasources so I dont have rich experience with this tool yet. jsoup