I'm wondering if we can know whether two resources have the same category or some subcategory (i.e., belong to categories of some common supercategory) in DBpedia? I tried this query in the DBpedia endpoint but it's wrong:
select distinct ?s ?s2 where {
?s skos:subject <http :// dbpedia.org/resource/ Category ?c.
?s2 skos:subject <http :// dbpedia.org/resource/ Category ?c2.
?c=?c2.
}
DBpedia doesn't use
skos:subject
for resources, but rather relates resources to their Wikipedia categories usingdcterms:subject
. You can find out what data is available by browsing the resource pages. E.g., you might have a look at http://dbpedia.org/resource/Mount_Monadnock. If you want to find categories that two resources have in common, just use the same variable. E.g.,You can write that more concisely with the
^property
notation and object lists. Writingo ^p s
is the same as writings p o
. Object lists let you writes p o1, o2
instead ofs p o1. s p o2.
. Putting these together, we can write:E.g., here's a query that finds common categories of Mount Monadnock and Spofford Lake. There's just one result, Landforms of Cheshire County, New Hampshire, since they only have one category in common.
SPARQL results
Now, categories are related to their supercategories in DBpedia by
skos:broader
, as you can see in http://dbpedia.org/page/Category:Landforms_of_Cheshire_County,_New_Hampshire, where there are links toNow, this means that if two things have have some common category (or supercategory), each will be related to that category by a path starting with a
dcterms:subject
link and followed by zero or moreskos:broader
links. Thus, you could use a query likeYou'll find, unfortunately, that the DBpedia endpoint runs into memory usage problems with that query, so you can't run it exactly like that. However, the DBpedia SPARQL endpoint supports a property path feature that actually didn't make it into the standard; you can write
p{n,m}
to denote a chain of length at leastn
and at mostm
. This means you can put some ranges on that will get you most of the same results as*
:SPARQL results
This works with Tom Cruise and Madonna as well, though you'll need to scale back the path length a bit because of the memory issues. For instance, the following query returns seventy-four results.
SPARQL results
It's worth noting, though, that Wikipedia categories aren't types. So while both of those resources are rightly considered to be landforms, neither is a geography or, as you'll see in the later query, New Hampshire. Wikipedia categories are much more about topic than a type hierarchy.
Related reading
There's a related (but not quite duplicate question) that you might find helpful as well: Using SPARQL to locate a subject with multiple occurrences of same property.