Parsing RDF items

2019-07-20 16:07发布

I have a couple lines of (I think) RDF data

<http://www.test.com/meta#0001> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://www.w3.org/2002/07/owl#Class> 
<http://www.test.com/meta#0002> <http://www.test.com/meta#CONCEPT_hasType> "BEAR"^^<http://www.w3.org/2001/XMLSchema#string>

Each line has 3 items in it. I want to pull out the item before and after the URL. So that would result in:

0001, type, Class
0002, CONCEPT_hasType, (BEAR, string)

Is there a library out there (java or scala) that would do this split for me? Or do I just need to shove string.splits and assumptions in my code?

标签: java scala rdf
1条回答
我命由我不由天
2楼-- · 2019-07-20 16:31

Most RDF libraries will have something to facilitate this. For example, if you parse your RDF data using Eclipse RDF4J's Rio parser, you will get back each line as a org.eclipse.rdf4j.model.Statement, with a subject, predicate and object value. The subject in both your lines will be an org.eclipse.rdf4j.model.IRI, which has a getLocalName() method you can use to get the part behind the last #. See the Javadocs for more details.

Assuming your data is in N-Triples syntax (which it seems to be given the example you showed us), here's a simple bit of code that does this and prints it out to STDOUT:

  // parse the file into a Model object
  InputStream in = new FileInputStream(new File("/path/to/rdf-data.nt"));
  org.eclipse.rdf4j.model.Model model = Rio.parse(in, RDFFormat.NTRIPLES);

  for (org.eclipse.rdf4j.model.Statement st: model) {
       org.eclipse.rdf4j.model.Resource subject = st.getSubject();
       if (subject instanceof org.eclipse.rdf4j.model.IRI) {
              System.out.print(((IRI)subject).getLocalName());
       }
       else {
              System.out.print(subject.stringValue());
       }
       // ... etc for predicate and object (the 2nd and 3rd elements in each RDF statement)
  }

Update if you don't want to read data from a file but simply use a String, you could just use a java.io.StringReader instead of an InputStream:

 StringReader r = new StringReader("<http://www.test.com/meta#0001> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://www.w3.org/2002/07/owl#Class> .");
 org.eclipse.rdf4j.model.Model model = Rio.parse(r, RDFFormat.NTRIPLES);

Alternatively, if you don't want to parse the data at all and just want to do String processing, there is a org.eclipse.rdf4j.model,URIUtil class which you can just feed a string and it can give you back the index of the local name part:

  String uri = "http://www.test.com/meta#0001";
  String localpart = uri.substring(URIUtil.getLocalNameIndex(uri));  // will be "0001" 

(disclosure: I am on the RDF4J development team)

查看更多
登录 后发表回答