I'am using the following sparql query to extract from dbpedia the pages which match a specific infobox:
PREFIX dbo: <http://dbpedia.org/ontology/>
PREFIX dbpedia: <http://dbpedia.org/property/>
PREFIX res:<http://dbpedia.org/resource/>
SELECT DISTINCT *
WHERE {
?page dbpedia:wikiPageUsesTemplate ?template .
?page rdfs:label ?label .
FILTER (regex(?template, 'Infobox_artist')) .
FILTER (lang(?label) = 'en')
}
LIMIT 100
In this line of the query :
FILTER (regex(?template, 'Infobox_artist')) .
I get all the infoboxes that start with artist as artist_discography and other which I don't need. My question is: how can I get by a regex only the infoboxes that matche exactly "infobox_artist" ?
As it is a regex you should be able to restrict the search as follows:
FILTER (regex(?template, '^Infobox_artist$')) .
^
is the beginning of a string
$
is the end of a string
in a regex.
NB: I've not used sparql, so this may well not work.
While the approach suggested by @beny23 works, it is really very inefficient. Using a regex for essentially matching an exact value is (potentially) putting an unnessary burden on the endpoint being queried. This is bad practice.
The value of ?template
is a URI, so you really should use a value comparison (or even inline as @cygri demonstrated):
SELECT DISTINCT * {
?page dbpedia:wikiPageUsesTemplate ?template .
?page rdfs:label ?label .
FILTER (lang(?label) = 'en')
FILTER (?template = <http://dbpedia.org/resource/Template:Infobox_artist> )
}
LIMIT 100
You can still easily adapt this query string in code to work with different types of infoboxes. Also: depending on which toolkit you use to create and execute SPARQL queries, you may have some programmatic alternatives to make query reuse even easier.
For example, you can create a "prepared query" which you can reuse, and set a binding to a particular value before executing it. For example, in Sesame you could do something like this:
String q = "SELECT DISTINCT * { " +
" ?page dbpedia:wikiPageUsesTemplate ?template . " +
" ?page rdfs:label ?label . " +
" FILTER (lang(?label) = 'en') " +
" } LIMIT 100 ";
TupleQuery query = conn.prepareTupleQuery(SPARQL, q);
URI infoboxArtist = f.createURI(DBPedia.NAMESPACE, "Template:Infobox_artist");
query.setBinding("template", infoboxArtist);
TupleQueryResult result = query.evaluate();
(As an aside: showing example using Sesame because I'm on the Sesame development team, but no doubt other SPARQL/RDF toolkits have similar functionality)
If all you want to do is a direct string comparison, then you don't need a regex! This is simpler and faster:
SELECT DISTINCT * {
?page dbpedia:wikiPageUsesTemplate
<http://dbpedia.org/resource/Template:Infobox_artist> .
?page rdfs:label ?label .
FILTER (lang(?label) = 'en')
}
LIMIT 100