Parsing schema.org ttl/owl file using Jena

2019-01-20 17:38发布

问题:

I'm writing a code generator that generate entities (POJO's in Java language) from the schema defined here http://schema.rdfs.org/all.ttl. I'm using Jena to parse the ttl file and retrieve the meta data that I need to generate them.

Jena parses the file successfully, however, for some reason it does not list all the attributes of a given entity, e.g., Person. I'm not sure whether I'm doing something wrong, using the wrong API, etc. Here's the code sample that recreates the scenario:

    public class PersonParser {

    public static void main(String[] args) {
        OntModel model = ModelFactory.createOntologyModel();
        URL url = Thread.currentThread().getContextClassLoader().getResource("schema_org.ttl");
        model.read(url.toString(), "TURTLE");
        OntClass ontclass = model.getOntClass("http://schema.org/Person");
        Iterator<OntProperty> props = ontclass.listDeclaredProperties();
        while (props.hasNext()) {
            OntProperty p = props.next();
            System.out.println("p:" + p.getLocalName());
        }
    }
}

Basically, I'm looking for only one class called Person and trying to list all its properties and what I get is:

p:alternateName
p:deathDate
p:alumniOf
p:sameAs
p:url
p:additionalName
p:homeLocation
p:description
p:nationality
p:sibling
p:follows
p:siblings
p:colleagues
p:memberOf
p:knows
p:name
p:gender
p:birthDate
p:children
p:familyName
p:jobTitle
p:workLocation
p:parents
p:affiliation
p:givenName
p:honorificPrefix
p:parent
p:colleague
p:additionalType
p:honorificSuffix
p:image
p:worksFor
p:relatedTo
p:spouse
p:performerIn

But if you look at http://schema.org/Person, it's got a bunch of properties that it did not list (for example address). The declaration of schema:address in http://schema.rdfs.org/all.ttl is:

schema:address a rdf:Property;
    rdfs:label "Address"@en;
    rdfs:comment "Physical address of the item."@en;
    rdfs:domain [ a owl:Class; owl:unionOf (schema:Person schema:Place schema:Organization) ];
    rdfs:range schema:PostalAddress;
    rdfs:isDefinedBy <http://schema.org/Person>;
    rdfs:isDefinedBy <http://schema.org/Place>;
    rdfs:isDefinedBy <http://schema.org/Organization>;
    .

Has anyone come across this? Should I be using a different Jena interface to parse the schema?

回答1:

Note that the documentation on listDeclaredProperties is (emphasis added):

listDeclaredProperties

com.hp.hpl.jena.util.iterator.ExtendedIterator<OntProperty> listDeclaredProperties(boolean direct)

Return an iterator over the properties associated with a frame-like view of this class. This captures an intuitive notion of the properties of a class. This can be useful in presenting an ontology class in a user interface, for example by automatically constructing a form to instantiate instances of the class. The properties in the frame-like view of the class are determined by comparing the domain of properties in this class's OntModel with the class itself. See: Presenting RDF as frames for more details.

Note that many cases of determining whether a property is associated with a class depends on RDFS or OWL reasoning. This method may therefore return complete results only in models that have an attached reasoner.

Parameters:

  • direct - If true, restrict the properties returned to those directly associated with this class. If false, the properties of super-classes of this class will not be listed among the declared properties of this class.

Returns:

An iteration of the properties that are associated with this class by their domain.

So, even before looking at the particular schema, it's important to note that unless you're using a reasoner, you might not get all the results you expect. Then, notice how the address property is declared:

schema:address a rdf:Property;
    rdfs:label "Address"@en;
    rdfs:comment "Physical address of the item."@en;
    rdfs:domain [ a owl:Class; owl:unionOf (schema:Person schema:Place schema:Organization) ];
    rdfs:range schema:PostalAddress;
    rdfs:isDefinedBy <http://schema.org/Person>;
    rdfs:isDefinedBy <http://schema.org/Place>;
    rdfs:isDefinedBy <http://schema.org/Organization>;

The domain of address is a union class: Person or Place or Organization. That's a superclass of Person, but it's a complex class expression, not just a simple named class, so you'll probably need a reasoner, as the documentation mentions, to get Jena to recognize that it's a superclass of Person.

Comparison with OWL semantics

I think that using a reasoner will allow Jena to recognize that the domain of address is a superclass of Person, and thus include it in the result of listDeclaredProperties. It's worth noting how this differs from OWL semantics, though.

In OWL, what it means for a class D to be the domain of a property P means that whenever we have a triple with the property P, we can infer that the subject is a D. This can be expressed by the rule

P rdfs:domain D     X P Y
-------------------------
    X rdf:type D

So, even though a Person might have an address, just because something has an address isn't enough to tell us that that something is a Person; it could still be a Place or Organization.