sparql exact match regex

2019-06-25 16:40发布

问题:

I'am using the following sparql query to extract from dbpedia the pages which match a specific infobox:

PREFIX dbo: <http://dbpedia.org/ontology/>
PREFIX dbpedia: <http://dbpedia.org/property/>
PREFIX res:<http://dbpedia.org/resource/>
SELECT DISTINCT *
WHERE {
?page dbpedia:wikiPageUsesTemplate ?template .
?page rdfs:label ?label .
FILTER (regex(?template, 'Infobox_artist')) .
FILTER (lang(?label) = 'en')
}
LIMIT 100

In this line of the query :

FILTER (regex(?template, 'Infobox_artist')) .

I get all the infoboxes that start with artist as artist_discography and other which I don't need. My question is: how can I get by a regex only the infoboxes that matche exactly "infobox_artist" ?

回答1:

As it is a regex you should be able to restrict the search as follows:

FILTER (regex(?template, '^Infobox_artist$')) .
  • ^ is the beginning of a string
  • $ is the end of a string

in a regex.

NB: I've not used sparql, so this may well not work.



回答2:

While the approach suggested by @beny23 works, it is really very inefficient. Using a regex for essentially matching an exact value is (potentially) putting an unnessary burden on the endpoint being queried. This is bad practice.

The value of ?template is a URI, so you really should use a value comparison (or even inline as @cygri demonstrated):

SELECT DISTINCT * {
    ?page dbpedia:wikiPageUsesTemplate ?template .
    ?page rdfs:label ?label .
    FILTER (lang(?label) = 'en')
    FILTER (?template = <http://dbpedia.org/resource/Template:Infobox_artist> )
}
LIMIT 100

You can still easily adapt this query string in code to work with different types of infoboxes. Also: depending on which toolkit you use to create and execute SPARQL queries, you may have some programmatic alternatives to make query reuse even easier.

For example, you can create a "prepared query" which you can reuse, and set a binding to a particular value before executing it. For example, in Sesame you could do something like this:

String q = "SELECT DISTINCT * { " +
               " ?page dbpedia:wikiPageUsesTemplate ?template . " +
               " ?page rdfs:label ?label . " +
               " FILTER (lang(?label) = 'en') " +
               " } LIMIT 100 ";

TupleQuery query = conn.prepareTupleQuery(SPARQL, q);
URI infoboxArtist = f.createURI(DBPedia.NAMESPACE, "Template:Infobox_artist");
query.setBinding("template", infoboxArtist); 

TupleQueryResult result = query.evaluate();

(As an aside: showing example using Sesame because I'm on the Sesame development team, but no doubt other SPARQL/RDF toolkits have similar functionality)



回答3:

If all you want to do is a direct string comparison, then you don't need a regex! This is simpler and faster:

SELECT DISTINCT * {
    ?page dbpedia:wikiPageUsesTemplate
        <http://dbpedia.org/resource/Template:Infobox_artist> .
    ?page rdfs:label ?label .
    FILTER (lang(?label) = 'en')
}
LIMIT 100


标签: regex sparql