I have the following data:
<node:1><urn:connectTo><node:2>
<node:1><urn:connectTo><node:3>
<node:1><urn:connectTo><node:4>
<node:2><urn:connectTo><node:10>
<node:2><urn:connectTo><node:11>
<node:2><urn:connectTo><node:12>
<node:3><urn:connectTo><node:21>
<node:3><urn:connectTo><node:13>
<node:3><urn:connectTo><node:41>
<node:3><urn:connectTo><node:100>
<node:4><urn:connectTo><node:119>
<node:4><urn:connectTo><node:120>
As you can see, every node has multiple connections. I want to select one connection randomly for each node. How can I do this? I've tried the following queries, but none solve the problem:
-
select ?currentNode ?nextNode where { ?currentNode ?p ?nextNode BIND(RAND() AS ?orderKey) } ORDER BY ?orderKey LIMIT 1
select ?currentNode SAMPLE(?nextNode) as ?nextNode1 where { ?currentNode ?p ?nextNode } GROUP BY ?currentNode
Note: the result gives the first connection of each node but not randomly
select ?currentNode ?nextNode (COUNT(?nextNode) AS ?noOfChoices) where { ?currentNode ?p ?nextNode BIND(RAND() AS ?orderKey) } GROUP BY ?currentNode ORDER BY ?orderKey OFFSET (RAND()*?noOfChoices) LIMIT 1
The sample aggregate returns an individual from within a group:
This would be a query like:
Of course, as you note, the implementation just has to return an arbitrary individual. This could easily be the same one each time. You could do some ordering in a subquery and hope to randomize the order of the targets in order to get different results from sample, but there's no requirement that the order of results from a subquery is preserved either. That would look like this:
This seems to work with Apache Jena. Here are results from repeated calls: