Removing unwanted superclass answers in SPARQL

2019-07-02 23:49发布

问题:

I have an OWL file that includes a taxonomic hierarchy that I want to write a query where the answer includes each individual and its immediate taxonomic parent. Here's an example (the full query is rather messier).

@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .
@prefix rdf: <http:://www.w3.org/1999/02/22-rdf-syntax-ns#> .
@prefix : <urn:ex:> .

:fido rdf:type :Dog .
:Dog rdfs:subClassOf :Mammal .
:Mammal rdfs:subClassOf :Vertebrate .
:Vertebrate rdfs:subClassOf :Animal .
:fido :hasToy :bone

:kitty rdf:type :Cat .
:Cat rdfs:subClassOf :Mammal .
:kitty :hasToy :catnipMouse .

And this query does what I want.

prefix rdf: <http:://www.w3.org/1999/02/22-rdf-syntax-ns#> .
prefix : <urn:ex:> .

SELECT ?individual ?type 
WHERE {
   ?individual :hasToy :bone .
   ?individual rdf:type ?type .
}

The problem is that I'd rather use a reasoned-over version of the OWL file, which unsurprisingly includes additional statements:

:fido rdf:type :Mammal .
:fido rdf:type :Vertebrate .
:fido rdf:type :Animal .
:kitty rdf:type :Mammal .
:kitty rdf:type :Vertebrate .
:kitty rdf:type :Animal .

And now the query results in additional answers about Fido being a Mammal, etc. I could just give up on using the reasoned version of the file, or, since the SPARQL queries are called from java, I could do a bunch of additional queries to find the least inclusive type that appears. My question is whether there is a reasonable pure SPARQL solution to only returning the Dog solution.

回答1:

A generic solution is that you make sure you ask for the direct type only. A class C is the direct type of an instance X if:

  1. X is of type C
  2. there is no C' such that:
    • X is of type C'
    • C' is a subclass of C
    • C' is not equal to C

(that last condition is necessary, by the way, because in RDF/OWL, the subclass-relation is reflexive: every class is a subclass of itself)

In SPARQL, this becomes something like this:

PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX : <urn:ex:> .

SELECT ?individual ?type 
WHERE {
   ?individual :hasToy :bone .
   ?individual a ?type .
   FILTER NOT EXISTS { ?individual a ?other .
                       ?other rdfs:subClassOf ?type .
                       FILTER(?other != ?type)
   }
}

Depending on which API/triplestore/library you use to execute these queries, there may also be other, tool-specific solutions. For example, the Sesame API (disclosure: I am on the Sesame dev team) has the option to disable reasoning for the purpose of a single query:

TupleQuery query = conn.prepareTupleQuery(SPARQL, "SELECT ...");
query.setIncludeInferred(false); 

TupleQueryResult result = query.evaluate();

Sesame also offers an optional additional inferencer (called the 'direct type inferencer') which introduces additional 'virtual' properties you can query, such as sesame:directType, sesame:directSubClassOf, etc. Other tools will undoubtedly have similar options.