Reasoning with Fuseki, TDB and named graphs?

2019-02-18 06:33发布

问题:

I'm serving a dataset containing 10-20 named graphs from a TDB dataset in Fuseki 2. I'd like to use a reasoner to do inference on my data. The behaviour I'd like to see is that triples inferred within each graph should appear within those graphs (although it would be fine if the triples appear in the default graph too). Is there a simple way of configuring this? I haven't found any configuration examples that match what I am trying to do.

The configuration I've tried is very similar to the following standard example.

DatasetTDB -> GraphTDB -> InfModel -> RDFDataset

The final view of the data I see is only a very tiny subset of the data (it appears that all the named graphs are dropped somewhere along this pipeline, and only the tiny default graph is left). Using tdb:unionDefaultGraph seems to have no effect on this.

prefix :        <#> .
@prefix fuseki:  <http://jena.apache.org/fuseki#> .
@prefix rdf:     <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .

@prefix rdfs:    <http://www.w3.org/2000/01/rdf-schema#> .
@prefix tdb:     <http://jena.hpl.hp.com/2008/tdb#> .
@prefix ja:      <http://jena.hpl.hp.com/2005/11/Assembler#> .

# Example of a data service with SPARQL query and update on an 
# inference model.  Data is taken from TDB.

## ---------------------------------------------------------------
## Service with only SPARQL query on an inference model.
## Inference model base data is in TDB.

<#service2>  rdf:type fuseki:Service ;
fuseki:name              "inf" ;             # http://host/inf
fuseki:serviceQuery      "sparql" ;          # SPARQL query service
fuseki:serviceUpdate     "update" ;
fuseki:dataset           <#dataset> ;
.

<#dataset> rdf:type       ja:RDFDataset ;
ja:defaultGraph       <#model_inf> ;
 .

<#model_inf> a ja:InfModel ;
 ja:baseModel <#tdbGraph> ;
 ja:reasoner [
     ja:reasonerURL <http://jena.hpl.hp.com/2003/OWLFBRuleReasoner>
 ] .

## Base data in TDB.
<#tdbDataset> rdf:type tdb:DatasetTDB ;
tdb:location "DB" ;
# If the unionDefaultGraph is used, then the "update" service should be removed.
# tdb:unionDefaultGraph true ;
.

<#tdbGraph> rdf:type tdb:GraphTDB ;
tdb:dataset <#tdbDataset> .
</code>

Does anyone have any thoughts on this?

Also, bonus points if there's a way to make the dataset writable. (On some level, what I'm trying to do is approach the default behaviour of Owlim/GraphDB, which keeps persistent named graphs, does inferencing, and also allows for updates.)

Thanks in advance.

回答1:

I'm facing (or faced) the same problems on my code, but I have a partial solution. Unfortunately the link provided in the comments did not really help the issues I'm still facing, but this answers part of the problem.

The final view of the data I see is only a very tiny subset of the data (it appears that all the named graphs are dropped somewhere along this pipeline, and only the tiny default graph is left). Using tdb:unionDefaultGraph seems to have no effect on this.

The workaround I found for this is to explicitly 'register' your named graphs in the configuration file. I don't really know if it is the best way (and did not found any documentation or example for this exact context). A working example on my setup (Fuseki 2.4):

[usual configuration start]

# TDB Dataset
:tdb_dataset_readwrite
        a             tdb:DatasetTDB ;
        tdb:unionDefaultGraph true ; 
        #if you want all data to available in the default graph
        #without 'FROM-NAMing them' in the SPARQL query
        tdb:location  "your/dataset/path" .

# Underlying RDF Dataset
<#dataset> 
    rdf:type    ja:RDFDataset ;
    ja:defaultGraph <#model> ;
    ja:namedGraph [
        ja:graphName    <your/graph/URI> ;
        ja:graph        <#graphVar> 
    ] ;

    [repeat for other named graphs]
    .      


######
# Default Model : Inference rules (OWL, here)
<#model> a ja:InfModel;
    ja:baseModel <#tdbGraph>;
    ja:reasoner
    [ ja:reasonerURL 
        <http://jena.hpl.hp.com/2003/OWLFBRuleReasoner>
    ]
    .

# Graph for the default Model
<#tdbGraph> rdf:type tdb:GraphTDB;
    tdb:dataset :tdb_dataset_readwrite .

######
# Named Graph
<#graphVar> rdf:type tdb:GraphTDB ;
    tdb:dataset :tdb_dataset_readwrite ;
    tdb:graphName <your/graph/URI> 
    .

Then, you can run a query like this one

[prefixes]

SELECT ?graph ?predicate ?object
WHERE {
  GRAPH ?graph {[a specific entity identifier] ?predicate ?object}
}
LIMIT 50

And it will display (in this case) properties and values, and the source graph where they were found.

BUT: in this example, even if the default graph supposedly imported inference rules (that should be applied globally, especially since the unionDefaultGraph parameter is enabled), they are not applied in a "cross-graph" manner, and that is the problem I am still facing.

Normally, if you add the inference engine to every graph, this should work, according to Andy Seaborne's post here, but it doesn't work in my case.

Hope this helps nevertheless.



回答2:

I've come across this issue many times myself but I've actually never seen a solution. However, I managed to figure it out after having read this in the documentation about "special graph names" in TDB datasets. From what I understand, setting the union default graph for a TDB dataset in the assembler file only changes what is returned when that particular dataset is queried. However, there is a special graph name that can be used to reference the union graph: <urn:x-arq:UnionGraph>. So, simply create GraphTDB, reference the TDB dataset and point it to this special graph.

The config file below does what is requested in the question: reasoning is performed over the default union graph, and the result is exposed in the TDB dataset as writable service. (Note that the reasoning service will not see any changes in the dataset until it is reloaded, since reasoning is all done in memory).

@prefix :      <http://base/#> .
@prefix tdb:   <http://jena.hpl.hp.com/2008/tdb#> .
@prefix rdf:   <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
@prefix ja:    <http://jena.hpl.hp.com/2005/11/Assembler#> .
@prefix rdfs:  <http://www.w3.org/2000/01/rdf-schema#> .
@prefix fuseki: <http://jena.apache.org/fuseki#> .

# TDB
tdb:DatasetTDB  rdfs:subClassOf  ja:RDFDataset .
tdb:GraphTDB    rdfs:subClassOf  ja:Model .


# Service 1: Dataset endpoint (no reasoning)
:dataService a fuseki:Service ;
  fuseki:name           "tdbEnpoint" ;
  fuseki:serviceQuery   "sparql", "query" ;
  fuseki:serviceUpdate  "update" ;
  fuseki:dataset        :tdbDataset ;
.

# Service 2: Reasoning endpoint
:reasoningService a fuseki:Service ;
  fuseki:dataset                 :infDataset ;
  fuseki:name                    "reasoningEndpoint" ;
  fuseki:serviceQuery            "query", "sparql" ;
  fuseki:serviceReadGraphStore   "get" ;
.

# Inference dataset
:infDataset rdf:type ja:RDFDataset ;
            ja:defaultGraph :infModel ;
.

# Inference model
:infModel a ja:InfModel ;
           ja:baseModel :g ;

           ja:reasoner [
              ja:reasonerURL <http://jena.hpl.hp.com/2003/OWLFBRuleReasoner> ;
           ] ;
.

# Intermediate graph referencing the default union graph
:g rdf:type tdb:GraphTDB ;
   tdb:dataset :tdbDataset ;
   tdb:graphName <urn:x-arq:UnionGraph> ;
.

# The location of the TDB dataset
:tdbDataset rdf:type tdb:DatasetTDB ;
            tdb:location "/fuseki/databases/db" ;
            tdb:unionDefaultGraph true ; 
.