How to recursively expand blank nodes in SPARQL co

2020-07-05 06:41发布

问题:

There is probably an easy to answer to this, but I can't even figure out how to formulate the Google query to find it.

I'm writing SPARQL construct queries against a dataset that includes blank nodes. So if I do a query like

CONSTRUCT {?x ?y ?z .} WHERE {?x ?y ?z .}

Then one of my results might be:

nm:John nm:owns _:Node

Which is a problem if all of the

_:Node nm:has nm:Hats

triples don't also get into the query result somehow (because some parsers I'm using like rdflib for Python really don't like dangling bnodes).

Is there a way to write my original CONSTRUCT query to recursively add all triples attached to any bnode results such that no bnodes are left dangling in my new graph?

回答1:

Recursion isn't possible. The closest I can think of is SPARQL 1.1 property paths (note: that version is out of date) but bnode tests aren't available (afaik).

You could just remove the statements with trailing bnodes:

CONSTRUCT {?x ?y ?z .} WHERE 
{
  ?x ?y ?z .
  FILTER (!isBlank(?z))
}

or try your luck fetching the next bit:

CONSTRUCT {?x ?y ?z . ?z ?w ?v } WHERE 
{
  ?x ?y ?z .
  OPTIONAL {
    ?z ?w ?v
    FILTER (isBlank(?z) && !isBlank(?v))
  }
}

(that last query is pretty punishing, btw)

You may be better off with DESCRIBE, which will often skip bnodes.



回答2:

As user205512 suggests, performing that grab recursively is not possible, and as they point out, using optional(s) to go arbitrary levels down into your data getting the nodes is not feasible on anything but non-trivial sized databases.

Bnodes themselves are locally scoped, to the result set, or to the file. There's no guarantee that a BNode is you get from parsing or from a result set is the same id that is used in the database (though some database do guarantee this for query results). Furthermore, a query like "select ?s where { ?s ?p _:bnodeid1 }" is the same as "select ? where { ?s ?p ?o }" -- note that bnode is treated as a variable in that case, not as "the thing w/ the id 'bnodeid1'" This quirk of the design makes it difficult to query for bnodes, so if you are in control of the data, I'd suggest not using them. It's not hard to generate names for stuff that would otherwise be bnodes, and named resources v. bnodes will not increase overhead during querying.

That does not help you recurse down and grab data, but for that, I don't recommend doing such general queries; they don't scale well and usually return more than you want or need. I'd suggest you do more directed queries. Your original construct query will pull down the contents of the entire database, that's generally not what you want.

Lastly, while describe can be useful, there's not a standard implementation; the SPARQL spec doesn't define any particular behavior, so what it returns is left to the database vendor, and it can be different. That can make your code less portable if you plan on trying different databases with your application. If you want a specific behavior out of describe, you're best off implementing it yourself. Doing something like the concise bounded description for a resource is an easy piece of code, though you can run into some headaches around Bnodes.



回答3:

With regard to working with the ruby RDF.rb library, which allows SPARQL queries with significant convenience methods on RDF::Graph objects, the following should expand blank nodes.

rdf_type = RDF::SCHEMA.Person # for example
rdf.query([nil, RDF.type, rdf_type]).each_subject do |subject|
  g = RDF::Graph.new
  rdf.query([subject, nil, nil]) do |s,p,o|
    g << [s,p,o]
    g << rdf_expand_blank_nodes(o) if o.node?
  end
end

def rdf_expand_blank_nodes(object)
  g = RDF::Graph.new
  if object.node?
    rdf.query([object, nil, nil]) do |s,p,o|
      g << [s,p,o]
      g << rdf_expand_blank_nodes(o) if o.node?
    end
  end
  g
end