Extracting hierarchy for dbpedia entity using SPAR

2019-06-07 22:38发布

问题:

I am trying to extract the hierarchy of Wikipedia category or Yago classification for DBpedia resources using the SPARQL endpoint. For instance, I would like to find out all the possible categories and classes in hierarchical form of entity, say, http://dbpedia.org/resource/Nokia, like Thing → Organization → Company → … → Nokia.

回答1:

A simple SPARQL select can retrieve the information that you're interested in, though it won't be arranged hierarchically. You're interested in getting all the types of a resource, as well as the rdfs:subClassOf relations between them. Here's a very simple query for Nokia that can be run on the DBpedia SPARQL endpoint

SELECT * WHERE {
  dbpedia:Nokia a ?c1 ; a ?c2 .
  ?c1 rdfs:subClassOf ?c2 .
}

SPARQL results

If you treat each pair of classes in that result set as a directed edge and perform a topological sort , then you'll see the hierarchy of the classes to which the Nokia resource belongs. In fact, since it is probably convenient to treat this as a graph, you can get it in the form of an RDF graph by using a SPARQL construct query.

CONSTRUCT WHERE {
  dbpedia:Nokia a ?c1 ; a ?c2 .
  ?c1 rdfs:subClassOf ?c2 .
}

SPARQL results

The construct query produces this graph (in N3 format):

@prefix rdfs:   <http://www.w3.org/2000/01/rdf-schema#> .
@prefix dbpedia-owl:    <http://dbpedia.org/ontology/> .
@prefix owl:    <http://www.w3.org/2002/07/owl#> .
@prefix yago:   <http://dbpedia.org/class/yago/> .
@prefix rdf:    <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
@prefix dbpedia:    <http://dbpedia.org/resource/> .

dbpedia-owl:Agent   rdfs:subClassOf owl:Thing .
dbpedia-owl:Company rdfs:subClassOf dbpedia-owl:Organisation .
dbpedia-owl:Organisation    rdfs:subClassOf dbpedia-owl:Agent .
yago:CompaniesBasedInEspoo  rdfs:subClassOf yago:Company108058098 .
dbpedia:Nokia   rdf:type    yago:CompaniesListedOnTheHelsinkiStockExchange ,
        owl:Thing ,
        yago:CompaniesBasedInEspoo ,
        dbpedia-owl:Agent ,
        yago:DisplayTechnologyCompanies ,
        yago:ElectronicsCompaniesOfFinland ,
        dbpedia-owl:Company ,
        dbpedia-owl:Organisation ,
        yago:Company108058098 ,
        yago:CompaniesEstablishedIn1865 .
yago:CompaniesEstablishedIn1865 rdfs:subClassOf yago:Company108058098 .
yago:CompaniesListedOnTheHelsinkiStockExchange  rdfs:subClassOf yago:Company108058098 .
yago:DisplayTechnologyCompanies rdfs:subClassOf yago:Company108058098 .
yago:ElectronicsCompaniesOfFinland  rdfs:subClassOf yago:Company108058098 .

Remarks

The queries above retrieve the rdf:type hierarchy for Nokia. In the question, you also mention Wikipedia categories. DBpedia resources are associated with the Wikipedia categories to which their corresponding articles belong by the dcterms:subject property. Those Wikipedia categories are then structured hierarchically by skos:broader. These really are not types for the individuals though. For instance, the data contain:

dbpedia:Nokia dcterms:subject category:Finnish_brands
category:Finnish_brands skos:broader category:Brands_by_country

While it probably makes sense to say that Nokia is a Finnish_brand, it makes much less sense to say that Nokia is a Brand_by_country.