I am trying to extract the hierarchy of Wikipedia category or Yago classification for DBpedia resources using the SPARQL endpoint. For instance, I would like to find out all the possible categories and classes in hierarchical form of entity, say, http://dbpedia.org/resource/Nokia, like Thing → Organization → Company → … → Nokia.
问题:
回答1:
A simple SPARQL select can retrieve the information that you're interested in, though it won't be arranged hierarchically. You're interested in getting all the types of a resource, as well as the rdfs:subClassOf
relations between them. Here's a very simple query for Nokia that can be run on the DBpedia SPARQL endpoint
SELECT * WHERE {
dbpedia:Nokia a ?c1 ; a ?c2 .
?c1 rdfs:subClassOf ?c2 .
}
SPARQL results
If you treat each pair of classes in that result set as a directed edge and perform a topological sort , then you'll see the hierarchy of the classes to which the Nokia resource belongs. In fact, since it is probably convenient to treat this as a graph, you can get it in the form of an RDF graph by using a SPARQL construct query.
CONSTRUCT WHERE {
dbpedia:Nokia a ?c1 ; a ?c2 .
?c1 rdfs:subClassOf ?c2 .
}
SPARQL results
The construct query produces this graph (in N3 format):
@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .
@prefix dbpedia-owl: <http://dbpedia.org/ontology/> .
@prefix owl: <http://www.w3.org/2002/07/owl#> .
@prefix yago: <http://dbpedia.org/class/yago/> .
@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
@prefix dbpedia: <http://dbpedia.org/resource/> .
dbpedia-owl:Agent rdfs:subClassOf owl:Thing .
dbpedia-owl:Company rdfs:subClassOf dbpedia-owl:Organisation .
dbpedia-owl:Organisation rdfs:subClassOf dbpedia-owl:Agent .
yago:CompaniesBasedInEspoo rdfs:subClassOf yago:Company108058098 .
dbpedia:Nokia rdf:type yago:CompaniesListedOnTheHelsinkiStockExchange ,
owl:Thing ,
yago:CompaniesBasedInEspoo ,
dbpedia-owl:Agent ,
yago:DisplayTechnologyCompanies ,
yago:ElectronicsCompaniesOfFinland ,
dbpedia-owl:Company ,
dbpedia-owl:Organisation ,
yago:Company108058098 ,
yago:CompaniesEstablishedIn1865 .
yago:CompaniesEstablishedIn1865 rdfs:subClassOf yago:Company108058098 .
yago:CompaniesListedOnTheHelsinkiStockExchange rdfs:subClassOf yago:Company108058098 .
yago:DisplayTechnologyCompanies rdfs:subClassOf yago:Company108058098 .
yago:ElectronicsCompaniesOfFinland rdfs:subClassOf yago:Company108058098 .
Remarks
The queries above retrieve the rdf:type
hierarchy for Nokia. In the question, you also mention Wikipedia categories. DBpedia resources are associated with the Wikipedia categories to which their corresponding articles belong by the dcterms:subject
property. Those Wikipedia categories are then structured hierarchically by skos:broader
. These really are not types for the individuals though. For instance, the data contain:
dbpedia:Nokia dcterms:subject category:Finnish_brands
category:Finnish_brands skos:broader category:Brands_by_country
While it probably makes sense to say that Nokia is a Finnish_brand, it makes much less sense to say that Nokia is a Brand_by_country.