I'm trying to parse a RDF document recursive using Apache Jena. It consists out of datasets like this:
<dcat:dataset>
<dcat:Dataset rdf:about="http://url/" >
<dct:description xml:lang="ca">Description</dct:description>
<dct:license rdf:resource="http://creativecommons.org/licenses/by/3.0/"/>
<dcat:keyword xml:lang="ca">Keyword1</dcat:keyword>
<dcat:distribution>
<dcat:Download>
<dcat:accessURL>http:/url/</dcat:accessURL>
<dct:format>
<dct:IMT>
<rdf:value>application/pdf</rdf:value>
<rdfs:label>pdf</rdfs:label>
</dct:IMT>
</dct:format>
<dct:modified rdf:datatype="http://www.w3.or/2001/XMLSchema#date">2012-11-09T16:23:22</dct:modified>
</dcat:Download>
</dcat:distribution>
<dct:publisher>
<foaf:Organization>
<dct:title xml:lang="en">Company</dct:title>
<foaf:homepage rdf:resource="http://url/"/>
</foaf:Organization>
</dct:publisher>
</dcat:Dataset>
</dcat:dataset>
I'm so far to get every statement, which is directly beneath dcat:Dataset (Iterate over specific resource in RDF file with Jena), but I want to find every triple in every level. My output should look like this:
description: Description
license: http://creativecommons.org/licenses/by/3.0/
keyword: Keyword1
distribution -> Download -> accessurl: http:/url/
distribution -> Download -> format -> IMT -> value: application/pdf
distribution -> Download -> format -> IMT -> label: pdf
...
I've tried it with a recursive function, which iterates over the statements and when a statement is not a literal it follows the object to the next node. Like this:
private String recursiveQuery(Statement stmt) {
Resource subject = stmt.getSubject();
Property predicate = stmt.getPredicate();
RDFNode object = stmt.getObject();
if(object.isLiteral()) {
out.println("LIT: " + predicate.getLocalName());
return object.toString();
} else {
out.println(predicate.getLocalName());
Resource r = stmt.getResource();
StmtIterator stmts = r.listProperties();
while (stmts.hasNext()) {
Statement s = stmts.next();
out.println(s.getPredicate().getLocalName());
return recursiveQuery(s);
}
}
return null;
}
But somehow I'm getting nowhere with this method. Thank you very much for every insight.
Based on the earlier question that you linked to, I completed your data so that we have some working data to use. Here is the completed data:
It sounds like you are just trying to do a depth first search on each element of type
dcat:Dataset
. That's easy enough to do. We just select each element of typedcat:Dataset
and then start a depth first search from thatRDFNode
.This produces the output:
which is less pretty than the output you described, but seems to be what you want.
Note on RDF as a Graph Representation
The question used the notation “every statement, which is directly beneath
dcat:Dataset
,” and I think that it is worth pointing out, just in case there is any confusion, that RDF is a graph-based representation. It is true that the RDF/XML serialization can be used to provide some nicely structured XML that is human readable, but there is nothing that requires that that XML representation has that sort of structure. To see this difference, note that the following RDF/XML represents the same graph as the one posted earlier in this answer.The RDF graph is exactly the same, even though the XML structure is very different. I only bring this up to highlight the fact that it really is important to work with RDF as a graph, not as hierarchical XML, even if a particular serialization might suggest that we could work with the latter.