I have a list of company names such as Microsoft Corp, Kimberly Clark Corporation etc, and for each company, I would like to retrieve fields such as:
- Its company logo
- Georgraphic identifier for google maps
- Website url
- Year established
- Stock exchange and stock exchange ticker symbol
- A way to get the stock prices over the last few days
- About / abstract from wikipedia
- A list of subsidiaries and parent companies. For instance, for Boeing it would be Jeppessen and Availl, Inc etc.
I have looked into Sparql and Dbpedia. Any suggestion on how to come up with the sparql query to retrieve some of those information? (I don't need to retrieve all the fields just a couple fields for me to get started.)
Thanks!
You can start using a query like this:
select * where {
values ?company { dbpedia:Microsoft
<http://dbpedia.org/resource/Apple_Inc.>
dbpedia:Kimberly-Clark
}
OPTIONAL { { ?company dbpprop:logo ?logo FILTER(isIRI(?logo)) }
UNION
{ ?company foaf:depiction ?logo FILTER(isIRI(?logo)) } }
OPTIONAL { ?company dbpedia-owl:abstract ?abstract
FILTER(langMatches(lang(?abstract),"EN")) }
OPTIONAL { ?company geo:lat ?latitude ;
geo:long ?longitude }
OPTIONAL { ?company dbpedia-owl:foundingDate ?foundingDate }
OPTIONAL { ?company dbpedia-owl:wikiPageExternalLink ?externalLink }
OPTIONAL { ?company dbpprop:symbol ?stockSymbol }
OPTIONAL { ?company dbpedia-owl:subsidiary ?subsidiaryPage }
}
SPARQL Results
I based this on the properties I saw on the DBpedia pages for Microsoft, Kimberly-Clark, and Apple, Inc.. The data isn't particularly clean, and because of that, I added a few filters to the query:
Not all of these list subsidiaries, and the subsidiary property for Microsoft doesn't relate to subsidiaries, but a page that presumably enumerates some subsidiaries).
Some of the companies have bad information for the logos (hence the FILTER
s with isIRI
). For instance, Apple's dbpprop:logo
is the integer 150
. I think that that comes from the Wikipedia infobox line | logo = [[File:{{#property:p154}}|150px]]
, where 150
is getting pulled out rather than a more meaningful value. Filtering by isIRI
helps a little bit.
Some of the companies have multiple founding dates. I'm not sure how you might decided which of the multiple ones to use.
While the company page is usually listed as an external link, not all of the external links associated with a page are the company page. I'm not sure how you could select one as the company page.
All that said, it looks like you can get a lot of this information from DBpedia.
you could start with the following sparql query. It retrieves all the triples for a subject having a name=Apple Inc.".
select distinct ?subject ?predicate ?object where {
?subject ?predicate ?object .
?subject <http://xmlns.com/foaf/0.1/name> "Apple Inc."@en .
}
SPARQL results
subject predicate object
http://dbpedia.org/resource/Apple_Inc. http://www.w3.org/1999/02/22-rdf-syntax-ns#type http://www.w3.org/2002/07/owl#Thing
http://dbpedia.org/resource/Apple_Inc. http://www.w3.org/1999/02/22-rdf-syntax-ns#type http://dbpedia.org/ontology/Company
http://dbpedia.org/resource/Apple_Inc. http://www.w3.org/1999/02/22-rdf-syntax-ns#type http://www.opengis.net/gml/_Feature
http://dbpedia.org/resource/Apple_Inc. http://www.w3.org/1999/02/22-rdf-syntax-ns#type http://dbpedia.org/ontology/Organisation
http://dbpedia.org/resource/Apple_Inc. http://www.w3.org/1999/02/22-rdf-syntax-ns#type http://dbpedia.org/ontology/Agent
http://dbpedia.org/resource/Apple_Inc. http://www.w3.org/1999/02/22-rdf-syntax-ns#type http://schema.org/Organization
http://dbpedia.org/resource/Apple_Inc. http://www.w3.org/1999/02/22-rdf-syntax-ns#type http://dbpedia.org/class/yago/ComputerCompaniesOfTheUnitedStates
http://dbpedia.org/resource/Apple_Inc. http://www.w3.org/1999/02/22-rdf-syntax-ns#type http://dbpedia.org/class/yago/SoftwareCompaniesOfTheUnitedStates
http://dbpedia.org/resource/Apple_Inc. http://www.w3.org/1999/02/22-rdf-syntax-ns#type http://dbpedia.org/class/yago/RetailCompaniesOfTheUnitedStates
http://dbpedia.org/resource/Apple_Inc. http://www.w3.org/1999/02/22-rdf-syntax-ns#type http://dbpedia.org/class/yago/CompaniesEstablishedIn1976
http://dbpedia.org/resource/Apple_Inc. http://www.w3.org/1999/02/22-rdf-syntax-ns#type http://dbpedia.org/class/yago/ComputerHardwareCompanies
http://dbpedia.org/resource/Apple_Inc. http://www.w3.org/1999/02/22-rdf-syntax-ns#type http://umbel.org/umbel/rc/Organization
http://dbpedia.org/resource/Apple_Inc. http://www.w3.org/1999/02/22-rdf-syntax-ns#type http://dbpedia.org/class/yago/Company108058098
http://dbpedia.org/resource/Apple_Inc. http://www.w3.org/1999/02/22-rdf-syntax-ns#type http://dbpedia.org/class/yago/HomeComputerHardwareCompanies
http://dbpedia.org/resource/Apple_Inc. http://www.w3.org/1999/02/22-rdf-syntax-ns#type http://dbpedia.org/class/yago/CompaniesBasedInCupertino,California
http://dbpedia.org/resource/Apple_Inc. http://www.w3.org/1999/02/22-rdf-syntax-ns#type http://dbpedia.org/class/yago/MobilePhoneManuFACturers