I'm making a query to get the URIs of documents, that have a specific title. My query is:
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX dc: <http://purl.org/dc/elements/1.1/> SELECT ?document WHERE {
?document dc:title ?title.
FILTER (?title = "…" ).
}
where "…"
is actually the value of this.getTitle()
, since the query string is generated by:
String queryString = "PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> " +
"PREFIX dc: <http://purl.org/dc/elements/1.1/> SELECT ?document WHERE { " +
"?document dc:title ?title." +
"FILTER (?title = \"" + this.getTitle() + "\" ). }";
With the query above, I get only the documents with titles exactly like this.getTitle()
. Imagine this.getTitle
is formed by more than 1 word. I'd like to get documents even if only one word forming this.getTitle
appears on the document title (for example). How could I do that?
Let's say you've got some data like (in Turtle):
Then you can use a query like:
to get results like
What's particularly neat about this is that since you're generating the pattern on the fly, you could even make it based on another value from the graph pattern. For instance, if you want all pairs of things whose titles match on at least one word, you could do:
to get:
Of course, it's very important to note that you're pulling generating patterns based on your data now, and that means that someone who can put data into your system could put very expensive patterns in to bog down the query and cause a denial-of-service. On a more mundane note, you could run into trouble if any of your titles have characters in them that would interfere with the regular expressions. One interesting problem would be if something had a title with multiple spaces so that the pattern became
The|Words|With||Two|Spaces
, since the empty pattern in there might make everything match. This is an interesting approach, but it's got a lot of caveats.In general, you could do this as shown here, or by generating the regular expression in code (where you can take care of escaping, etc.), or you could use a SPARQL engine that supports some text-based extensions (e.g., jena-text, which adds Apache Lucene or Apache Solr to Apache Jena).