sparql queries with round brackets throw exception

2020-04-05 07:22发布

问题:

I am trying to extract labels from DBpedia for some persons. I am partially successful now, but I got stuck in the following problem. The following code works.

public class DbPediaQueryExtractor {
    public static void main(String [] args) {
        String entity = "Aharon_Barak";
        String queryString ="PREFIX dbres: <http://dbpedia.org/resource/> SELECT * WHERE {dbres:"+ entity+ "<http://www.w3.org/2000/01/rdf-schema#label> ?o FILTER (langMatches(lang(?o),\"en\"))}";
        //String queryString="select *     where { ?instance <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://xmlns.com/foaf/0.1/Person>;  <http://www.w3.org/2000/01/rdf-schema#label>  ?o FILTER (langMatches(lang(?o),\"en\"))  } LIMIT 5000000";
        QueryExecution qexec = getResult(queryString);
        try {
            ResultSet results = qexec.execSelect();
            for ( ; results.hasNext(); )
            {
                QuerySolution soln = results.nextSolution();
                System.out.print(soln.get("?o") + "\n");
            }
        }
        finally {
            qexec.close();
        }
    }

    public static QueryExecution getResult(String queryString){
        Query query = QueryFactory.create(queryString);
        //VirtuosoQueryExecution vqe = VirtuosoQueryExecutionFactory.create (sparql, graph);
        QueryExecution qexec = QueryExecutionFactory.sparqlService("http://dbpedia.org/sparql", query);
        return qexec;
    }
}

However, when the entity contains brackets, it does not work. For example,

String entity = "William_H._Miller_(writer)";

leads to this exception:

Exception in thread "main" com.hp.hpl.jena.query.QueryParseException: Encountered " "(" "( "" at line 1, column 86.`

What is the problem?

回答1:

It took some copying and pasting to see what exactly was going on. I'd suggest that you put newlines in your query for easier readability. The query you're using is:

PREFIX dbres: <http://dbpedia.org/resource/>
SELECT * WHERE
{
  dbres:??? <http://www.w3.org/2000/01/rdf-schema#label> ?o 
  FILTER (langMatches(lang(?o),"en"))
}

where ??? is being replaced by the contents of the string entity. You're doing absolutely no input validation here to ensure that the value of entity will be legal to paste in. Based on your question, it sounds like entity contains William_H._Miller_(writer), so you're getting the query:

PREFIX dbres: <http://dbpedia.org/resource/>
SELECT * WHERE
{
  dbres:William_H._Miller_(writer) <http://www.w3.org/2000/01/rdf-schema#label> ?o 
  FILTER (langMatches(lang(?o),"en"))
}

You can paste that into the public DBpedia endpoint, and you'll get a similar parse error message:

Virtuoso 37000 Error SP030: SPARQL compiler, line 6: syntax error at 'writer' before ')'

SPARQL query:
define sql:big-data-const 0 
#output-format:text/html
define sql:signal-void-variables 1 define input:default-graph-uri <http://dbpedia.org> PREFIX dbres: <http://dbpedia.org/resource/>
SELECT * WHERE
{
  dbres:William_H._Miller_(writer) <http://www.w3.org/2000/01/rdf-schema#label> ?o 
  FILTER (langMatches(lang(?o),"en"))
}

Better than hitting DBpedia's endpoint with bad queries, you can also use the SPARQL query validator, which reports for that query:

Syntax error: Lexical error at line 4, column 34. Encountered: ")" (41), after : "writer"

In Jena, you can use the ParameterizedSparqlString to avoid these sorts of issues. Here's your example, reworked to use a parameterized string:

import com.hp.hpl.jena.query.ParameterizedSparqlString;

public class PSSExample {
    public static void main( String[] args ) {
        // Create a parameterized SPARQL string for the particular query, and add the 
        // dbres prefix to it, for later use.
        final ParameterizedSparqlString queryString = new ParameterizedSparqlString(
                "PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>\n" +
                "SELECT * WHERE\n" +
                "{\n" +
                "  ?entity rdfs:label ?o\n" +
                "  FILTER (langMatches(lang(?o),\"en\"))\n" +
                "}\n"
                ) {{
            setNsPrefix( "dbres", "http://dbpedia.org/resource/" );
        }};

        // Entity is the same. 
        final String entity = "William_H._Miller_(writer)";

        // Now retrieve the URI for dbres, concatentate it with entity, and use
        // it as the value of ?entity in the query.
        queryString.setIri( "?entity", queryString.getNsPrefixURI( "dbres" )+entity );

        // Show the query.
        System.out.println( queryString.toString() );
    }
}

The output is:

PREFIX dbres: <http://dbpedia.org/resource/>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
SELECT * WHERE
{
  <http://dbpedia.org/resource/William_H._Miller_(writer)> rdfs:label ?o
  FILTER (langMatches(lang(?o),"en"))
}

You can run this query at the public endpoint and get the expected results. Notice that if you use an entity that doesn't need special escaping, e.g.,

final String entity = "George_Washington";

then the query output will use the prefixed form:

PREFIX dbres: <http://dbpedia.org/resource/>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
SELECT * WHERE
{
  dbres:George_Washington rdfs:label ?o
  FILTER (langMatches(lang(?o),"en"))
}

This is very convenient, because you don't have to do any checking about whether your suffix, i.e., entity, has any characters that need to be escaped; Jena takes care of that for you.