Finding common categories or supercategories of re

2019-08-12 06:48发布

问题:

I'm wondering if we can know whether two resources have the same category or some subcategory (i.e., belong to categories of some common supercategory) in DBpedia? I tried this query in the DBpedia endpoint but it's wrong:

select distinct ?s ?s2 where {
?s skos:subject <http :// dbpedia.org/resource/ Category ?c.
?s2 skos:subject <http :// dbpedia.org/resource/ Category ?c2.
?c=?c2.
}

回答1:

DBpedia doesn't use skos:subject for resources, but rather relates resources to their Wikipedia categories using dcterms:subject. You can find out what data is available by browsing the resource pages. E.g., you might have a look at http://dbpedia.org/resource/Mount_Monadnock. If you want to find categories that two resources have in common, just use the same variable. E.g.,

?subject1 dcterms:subject ?category .
?subject2 dcterms:subject ?category .

You can write that more concisely with the ^property notation and object lists. Writing o ^p s is the same as writing s p o. Object lists let you write s p o1, o2 instead of s p o1. s p o2.. Putting these together, we can write:

?category ^dcterms:subject ?subject1, ?subject2 .

E.g., here's a query that finds common categories of Mount Monadnock and Spofford Lake. There's just one result, Landforms of Cheshire County, New Hampshire, since they only have one category in common.

select * where {
  ?category ^dcterms:subject dbpedia:Mount_Monadnock, dbpedia:Spofford_Lake .
}

SPARQL results

Now, categories are related to their supercategories in DBpedia by skos:broader, as you can see in http://dbpedia.org/page/Category:Landforms_of_Cheshire_County,_New_Hampshire, where there are links to

  • http://dbpedia.org/resource/Category:Landforms_of_New_Hampshire_by_county and
  • http://dbpedia.org/resource/Category:Geography_of_Cheshire_County,_New_Hampshire

Now, this means that if two things have have some common category (or supercategory), each will be related to that category by a path starting with a dcterms:subject link and followed by zero or more skos:broader links. Thus, you could use a query like

select * where {
  ?category ^(dcterms:subject/skos:broader*) dbpedia:Mount_Monadnock, dbpedia:Spofford_Lake .
}

You'll find, unfortunately, that the DBpedia endpoint runs into memory usage problems with that query, so you can't run it exactly like that. However, the DBpedia SPARQL endpoint supports a property path feature that actually didn't make it into the standard; you can write p{n,m} to denote a chain of length at least n and at most m. This means you can put some ranges on that will get you most of the same results as *:

select distinct ?category where {
  ?category ^(dcterms:subject/(skos:broader{0,3})) dbpedia:Mount_Monadnock, dbpedia:Spofford_Lake .
}

SPARQL results

This works with Tom Cruise and Madonna as well, though you'll need to scale back the path length a bit because of the memory issues. For instance, the following query returns seventy-four results.

select distinct ?category where {
  ?category
      ^(dcterms:subject/(skos:broader{0,2}))
          <http://dbpedia.org/resource/Tom_Cruise>,
          <http://dbpedia.org/resource/Madonna_(entertainer)> .
}

SPARQL results

It's worth noting, though, that Wikipedia categories aren't types. So while both of those resources are rightly considered to be landforms, neither is a geography or, as you'll see in the later query, New Hampshire. Wikipedia categories are much more about topic than a type hierarchy.

Related reading

There's a related (but not quite duplicate question) that you might find helpful as well: Using SPARQL to locate a subject with multiple occurrences of same property.