Difference in performance between using VALUES key

2019-07-23 06:48发布

问题:

I have a fairly complex SPARQL query with the structure outlined below, involving multiple graph patterns, UNION and nested FILTER NOT EXISTS.

I want the query to remain generic, and I want to be able to inject values for certain variables at execution time, and my idea is to append a VALUES keyword at the end of the query to specify the value of certain variables in the query. In the structure below, I set the value of ?x, and I illustrate all the places in the query where ?x applies.

However, in Fuseki I see that executing the query like that takes around 4 to 5 seconds, but manually replacing the ?x variable in the query with a URI, instead of specifying a VALUES clause, makes it run very fast.

  • I always thought that using the VALUES keyword at the end of the WHERE clause was like setting values inline for some variables, so I would expect using the VALUES clause or replacing the variables with their corresponding URI was the same in terms of query execution. Can someone confirm the expected behavior of the VALUES keyword? also explain the difference between using it outside of the WHERE clause or inside of the WHERE clause ?
  • Does the fact that the variable set using VALUES appears in FILTER NOT EXISTS clause change something?
  • Can you confirm this is the correct approach for the requirement above (I want the query to remain generic and I want to be able to inject values for certain variables at execution time)?
  • Is it possible that this behavior is specific to how Fuseki handles VALUES?

Thanks !

SELECT DISTINCT ...
WHERE {
    # ?x ...
    # ... basic graph pattern here 

    {
      {
        # ... basic graph pattern here 

        FILTER NOT EXISTS {
            # ?x ...
            # ... basic graph pattern here
        }

        FILTER NOT EXISTS {
            # ... basic graph pattern here
            FILTER NOT EXISTS {
                # ?x ...
                # ... basic graph pattern here
            }
        }       
      }
      UNION
      {
        ?x ...
        # ... basic graph pattern here
      }
      UNION
      {
        # ... basic graph pattern here

        FILTER NOT EXISTS {
            ?x ...
            # ... basic graph pattern here
        }

        FILTER NOT EXISTS {
            # ... basic graph pattern here
            FILTER NOT EXISTS {
                ?x ...
                # ... basic graph pattern here
            }
        }
      }
      UNION
      {
        ?x ...
      }
    }
}
VALUES ?x { <http://example.com/Foo> }

回答1:

Not supposed to be an answer, but formatting in comments is impossible...

There is at least some obvious difference in the algebra tree. How this is handled is probably implementation specific. Andy knows better and hopefully give a more useful answer than mine.

without VALUES:

Query

SELECT  ?s ?o
WHERE
  {   { <test_val>  <p>  ?o }
    UNION
      { <test_val>  <p>  ?o
        FILTER NOT EXISTS { <test_val>  a                   ?type }
      }
  }

Algebra tree (optimized)

(base <http://example/base/>
  (project (?s ?o)
    (union
      (bgp (triple <test_val> <p> ?o))
      (filter (notexists (bgp (triple <test_val> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> ?type)))
        (bgp (triple <test_val> <p> ?o))))))

with VALUES

Query

SELECT  ?s ?o
WHERE
  {   { ?s  <p>  ?o }
    UNION
      { ?s  <p>  ?o
        FILTER NOT EXISTS { ?s  a                     ?type }
      }
  }
VALUES ?s { <test_val> }

Algebra tree

(base <http://example/base/>
  (project (?s ?o)
    (join
      (union
        (bgp (triple ?s <p> ?o))
        (filter (notexists (bgp (triple ?s <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> ?type)))
          (bgp (triple ?s <p> ?o))))
      (table (vars ?s)
        (row [?s <test_val>])
      ))))

Algebra tree(optimized)

(base <http://example/base/>
  (project (?s ?o)
    (sequence
      (table (vars ?s)
        (row [?s <test_val>])
      )
      (union
        (bgp (triple ?s <p> ?o))
        (filter (notexists (bgp (triple ?s <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> ?type)))
          (bgp (triple ?s <p> ?o)))))))