Functions to manipulate RDF collections in SPARQL

2019-01-26 18:16发布

I would like to know if there are some functions to manipulate RDF Collections in SPARQL.

A motivating problem is the following.

Suppose you have:

@prefix : <http://example.org#> .
:x1 :value 3 .
:x2 :value 5 .
:x3 :value 6 .
:x4 :value 8 .

:list :values (:x1 :x2 :x3 :x4) .

And you want to calculate the following formula: ((Xn - Xn-1) + ... (X2 - X1)) / (N - 1)

Is there some general way to calculate it?

Up until now, I was only able to calculate it for a fixed set of values. For example, for 4 values, I can use the following query:

prefix : <http://example.org#> 
prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>

SELECT ?r { 
 ?list :values ?ls .
 ?ls rdf:first ?x1 .
 ?ls rdf:rest/rdf:first ?x2 .
 ?ls rdf:rest/rdf:rest/rdf:first ?x3 .
 ?ls rdf:rest/rdf:rest/rdf:rest/rdf:first ?x4 .
 ?x1 :value ?v1 .
 ?x2 :value ?v2 .
 ?x3 :value ?v3 .
 ?x4 :value ?v4 .
 BIND ( ((?v4 - ?v3) + (?v3 - ?v2) + (?v2 - ?v1)) / 3 as ?r)
}

What I would like is some way to access the Nth value and to define some kind of recursive function to calculate that expression. I think it is not possible, but maybe, someone has a nice solution.

标签: rdf sparql
2条回答
Bombasti
2楼-- · 2019-01-26 18:43

No built-ins that make formulas easier…

SPARQL does include some mathematical functions for arithmetic and aggregate computations. However, I don't know of any particularly convenient ways of concisely representing mathematical expressions in SPARQL. I've been looking at a paper lately that discusses an ontology for representing mathematical objects like expressions and definitions. They implemented a system to evalute these, but I don't think it used SPARQL (or at least, it wasn't just a simple extension of SPARQL).

Wenzel, Ken, and Heiner Reinhardt. "Mathematical Computations for Linked Data Applications with OpenMath." Joint Proceedings of the 24th Workshop on OpenMath and the 7th Workshop on Mathematical User Interfaces (MathUI). 2012.

…but we can still do this case.

That said, this particular case isn't too hard to do, since it's not too hard to work with RDF lists in SPARQL, and SPARQL includes the mathematical functions needed for this expression. First, a bit about RDF list representation, that will make the solution easier to understand. (If you're already familiar with this, you can skip the next paragraph or two.)

RDF lists are linked lists, and each list is related to it's first element by the rdf:first property, and to the rest of the list by rdf:rest. So the convenient notation (:x1 :x2 :x3 :x4) is actually shorthand for:

_:l1 rdf:first :x1 ; rdf:rest _:l2 .
_:l2 rdf:first :x2 ; rdf:rest _:l3 .
_:l3 rdf:first :x3 ; rdf:rest _:l4 .
_:l3 rdf:first :x4 ; rdf:rest rdf:nil .

Representing blank nodes with [], we can make this a bit clearer:

[ rdf:first :x1 ;
  rdf:rest [ rdf:first :x2 ;
             rdf:rest [ rdf:first :x3 ;
                        rdf:rest [ rdf:first :x4 ;
                                   rdf:rest rdf:nil ]]]]

Once the head of the list has been identified, that is, the element with rdf:first :x1, then any list l reachable from it by an even number repetitions (including 0) of rdf:rest/rdf:rest is a list whose rdf:first is an odd numbered element of the list (since you started indexing at 1). Starting at l and going forward one rdf:rest, we're at an l' whose rdf:first is an even numbered element of the list.

Since SPARQL 1.1 property paths let us write (rdf:rest/rdf:rest)* to denote any even numbered repetitions of rdf:rest, we can write up the following query that binds the :value of odd numbered elements of ?n and the value of the following even numbered elements to ?nPlusOne. The math in the SELECT form is straightforward, although to get N-1, we actually use 2*COUNT(*)-1, because the number of rows (each of which binds elements n and n+1) is N/2.

prefix : <http://example.org#> 
prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>

SELECT ( SUM(?nPlusOne-?n)/(2*COUNT(*)-1) as ?result) {
 ?list :values [ (rdf:rest/rdf:rest)* [ rdf:first [ :value ?n ] ; 
                                        rdf:rest  [ rdf:first [ :value ?nPlusOne ]]]] .
}

Results (using Jena's command line ARQ):

$ arq --query query.sparql --data data.n3 
------------------------------
| result                     |
==============================
| 1.333333333333333333333333 |
------------------------------

which is what is expected since

 (5 - 3) + (8 - 6)     2 + 2     4      _ 
------------------- = ------- = --- = 1.3
      (4 - 1)            3       3

Update

I just realized that what is implemented above was based on my comment on the question about whether the summation was correct, because it simplified very easily. That is, the above implements

(x2 - x1) + (x4 - x3) + ... + (xN - xN-1) / (N - 1)

whereas the original question asked for

(x2 - x1) + (x3 - x2) + … + (xN-1 - xN-2) + (xN - xN-1) / (N - 1)

The original is even simpler, since the pairs are identified by each rdf:rest of the original list, not just even numbers of repetitions. Using the same approach as above, this query can be represented by:

prefix : <http://example.org#> 
prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>

SELECT ( SUM(?nPlusOne-?n)/COUNT(*) as ?result) {
 ?list :values [ rdf:rest* [ rdf:first [ :value ?n ] ; 
                             rdf:rest  [ rdf:first [ :value ?nPlusOne ]]]] .
}

Results:

$ arq --query query.sparql --data data.n3 
------------------------------
| result                     |
==============================
| 1.666666666666666666666666 |
------------------------------

Of course, since the expression can be simplified to

xN - x1 / (N - 1)

we can also just use a query which binds ?x1 to the first element of the list, ?xn to the last element, and ?xi to each element of the list (so that COUNT(?xi) (and also COUNT(*)) is the number of items in the list):

prefix : <http://example.org#> 
prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>

SELECT (((?xn-?x1)/(COUNT(?xi)-1)) as ?result) WHERE {
 ?list :values [ rdf:rest*/rdf:first [ :value ?xi ] ;
                 rdf:first [ :value ?x1 ] ;
                 rdf:rest* [ rdf:first [ :value ?xn ] ; 
                             rdf:rest  rdf:nil ]] .
}
GROUP BY ?x1 ?xn

Results:

$ arq --query query.sparql --data data.n3 
------------------------------
| result                     |
==============================
| 1.666666666666666666666666 |
------------------------------
查看更多
乱世女痞
3楼-- · 2019-01-26 19:03

You may also have a look at alternative ways of describing/representing lists in RDF, e.g., with help of the Ordered List Ontology. I think with this model you can more easily query what you want ;)

查看更多
登录 后发表回答