What are values starting with “t” and how to ignor

I am trying to query the frequency of certain attributes in Wikidata, using SPARQL.

For example, to find out what the frequency of different values for gender is, I have the following query:

SELECT ?rid (COUNT(?rid) AS ?count)
WHERE { ?qid wdt:P21 ?rid.
  BIND(wd:Q5 AS ?human)
  ?qid wdt:P31 ?human.
} GROUP BY ?rid

I get the following result:

wd:Q6581097 2752163
wd:Q6581072 562339
wd:Q1052281 223
wd:Q1097630 68
wd:Q2449503 67
wd:Q48270   36
wd:Q44148   8
wd:Q43445   4
t152990852  1
t152990762  1
t152990752  1
t152990635  1
t152775383  1
t152775370  1
t152775368  1
...

I have the following questions regarding this:

What do those t152... values refer to?
How can I ignore the tuples containing t152...?
I tried FILTER ( !strstarts(str(?rid), "wd:") ) but it timed out.
How can I count the distinct number of answers?
I tried SELECT (COUNT(DISTINCT ?rid) AS ?count) with the above query, but again it timed out.

标签： sparql wikidata blazegraph blank-nodes

1条回答

兄弟一词,经得起流年.

2楼-- · 2019-05-25 21:19

Values starting with t are "skolemized" unknown values (see, e.g., Q2423351 for a person of unknown sex or gender).

In order to improve performance, I suggest you to divide your query into three parts:

All "normal" genders:

SELECT ?rid (COUNT(?qid) AS ?count) 
WHERE {
   ?qid wdt:P31 wd:Q5.
   ?qid wdt:P21 ?rid.
   ?rid wdt:P31 wd:Q48264 
} GROUP BY ?rid ORDER BY DESC(?count)

Please note that, according Wikidata, wd:Q746411 is a subclass of wd:Q48270, etc.

All "non-normal" genders:

SELECT ?rid (COUNT(?qid) AS ?count) 
WHERE {
   ?qid wdt:P31 wd:Q5.
   ?qid wdt:P21 ?rid.
   FILTER (?rid NOT IN
           (
            wd:Q6581097,
            wd:Q6581072,
            wd:Q1052281,
            wd:Q2449503,
            wd:Q48270,
            wd:Q746411,
            wd:Q189125,
            wd:Q1399232,
            wd:Q3277905
           )
          ).
   FILTER (isURI(?rid))
} GROUP BY ?rid ORDER BY DESC(?count)

I do not use FILTER NOT EXISTS {?rid wdt:P31 wd:Q48264 } due to performance reasons.

All (i.e. 1) "unknown" genders:

SELECT (COUNT(?qid) AS ?count) 
WHERE {
   ?qid wdt:P31 wd:Q5.
   ?qid wdt:P21 ?rid.
   FILTER (!isURI(?rid))
}

In fact, it is not very important in your case — to count distinct wd:Q5 or count them not distinct — but the latter is preferable due to performance reasons.

0人赞添加讨论(0) 举报

What are values starting with “t” and how to ignor

采纳回答

编辑标签

举报内容

检举类型

检举原因

检举说明(必填)

打开微信“扫一扫”，打开网页后点击屏幕右上角分享按钮

付费偷看金额在0.1-10元之间