sparql: randomly select one connection for each no

I have the following data:

<node:1><urn:connectTo><node:2>
<node:1><urn:connectTo><node:3>
<node:1><urn:connectTo><node:4>
<node:2><urn:connectTo><node:10>
<node:2><urn:connectTo><node:11>
<node:2><urn:connectTo><node:12>
<node:3><urn:connectTo><node:21>
<node:3><urn:connectTo><node:13>
<node:3><urn:connectTo><node:41>
<node:3><urn:connectTo><node:100>
<node:4><urn:connectTo><node:119>
<node:4><urn:connectTo><node:120>

As you can see, every node has multiple connections. I want to select one connection randomly for each node. How can I do this? I've tried the following queries, but none solve the problem:

select ?currentNode ?nextNode where {
  ?currentNode ?p ?nextNode
  BIND(RAND() AS ?orderKey)
}
ORDER BY ?orderKey
LIMIT 1

select ?currentNode SAMPLE(?nextNode) as ?nextNode1
where {
  ?currentNode ?p ?nextNode
}
GROUP BY ?currentNode

Note: the result gives the first connection of each node but not randomly

select ?currentNode ?nextNode (COUNT(?nextNode) AS ?noOfChoices)
where {
  ?currentNode ?p ?nextNode
  BIND(RAND() AS ?orderKey)
}
GROUP BY ?currentNode
ORDER BY ?orderKey
OFFSET (RAND()*?noOfChoices)
LIMIT 1

标签： random rdf sparql semantic-web

1条回答

爷、活的狠高调

2楼-- · 2019-02-19 07:34

The sample aggregate returns an individual from within a group:

Sample is a set function which returns an arbitrary value from the multiset passed to it. … For example, given Sample({"a", "b", "c"}), "a", "b", and "c" are all valid return values. Note that Sample() is not required to be deterministic for a given input, the only restriction is that the output value must be present in the input multiset.

This would be a query like:

prefix node: <node:>
prefix urn: <urn:>

select ?source (sample(?_target) as ?target) where {
  ?source urn:connectTo ?_target
}
group by ?source

---------------------
| source | target   |
=====================
| node:1 | node:2   |
| node:2 | node:10  |
| node:3 | node:13  |
| node:4 | node:119 |
---------------------

Of course, as you note, the implementation just has to return an arbitrary individual. This could easily be the same one each time. You could do some ordering in a subquery and hope to randomize the order of the targets in order to get different results from sample, but there's no requirement that the order of results from a subquery is preserved either. That would look like this:

prefix node: <node:>
prefix urn: <urn:>

select ?source (sample(?_target) as ?target) where {
  { select ?source ?_target {
      ?source urn:connectTo ?_target
    }
    order by rand() }
}
group by ?source

This seems to work with Apache Jena. Here are results from repeated calls:

---------------------
| source | target   |
=====================
| node:1 | node:2   |
| node:2 | node:11  |
| node:3 | node:100 |
| node:4 | node:120 |
---------------------

---------------------
| source | target   |
=====================
| node:1 | node:3   |
| node:2 | node:11  |
| node:3 | node:13  |
| node:4 | node:120 |
---------------------

---------------------
| source | target   |
=====================
| node:1 | node:3   |
| node:2 | node:10  |
| node:3 | node:21  |
| node:4 | node:119 |
---------------------

---------------------
| source | target   |
=====================
| node:1 | node:3   |
| node:2 | node:10  |
| node:3 | node:100 |
| node:4 | node:119 |
---------------------

0人赞添加讨论(0) 举报

sparql: randomly select one connection for each no

采纳回答

编辑标签

举报内容

检举类型

检举原因

检举说明(必填)

打开微信“扫一扫”，打开网页后点击屏幕右上角分享按钮

付费偷看金额在0.1-10元之间