ArangoDB Not Using Index During Traversal

2019-07-31 08:51发布

问题:

I have a simple graph traversal query:

FOR e in 0..3 ANY 'Node/5025926' Edge
FILTER 

e.ModelType == "A.Model" && 
e.TargetType == "A.Target" && 
e.SourceType == "A.Source"

RETURN e

The 'Edge' edge collection has a hash index defined for attributes ModelType, TargetType, SourceType, in that order.

When checking the execution plan, the results are:

Query string:
 FOR e in 0..3 ANY 'Node/5025926' Edge
 FILTER 
 e.ModelType == "A.Model" && 
 e.TargetType == "A.Target" && 
 e.SourceType == "A.Source"
 RETURN e

Execution plan:
 Id   NodeType          Est.   Comment
  1   SingletonNode        1   * ROOT
  2   TraversalNode        7     - FOR e  /* vertex */ IN 0..3  /* min..maxPathDepth */ ANY 'Node/5025926' /* startnode */  Edge
  3   CalculationNode      7     - LET #1 = (((e.`ModelType` == "A.Model") && (e.`TargetType` == "A.Target")) && (e.`SourceType` == "A.Source"))   /* simple expression */
  4   FilterNode           7     - FILTER #1
  5   ReturnNode           7     - RETURN e

Indexes used:
 none

Traversals on graphs:
 Id   Depth   Vertex collections   Edge collections   Filter conditions
  2   0..3                         Edge               

Optimization rules applied:
 none

Notice that the execution plan indicates that no indices will be used to process the query.

Is there anything I need to do to make the engine use the index on the Edge collection to process the results?

Thanks

回答1:

In ArangoDB 3.0 a traversal will always use the edge index to find connected vertices, regardless of which filter conditions are present in the query and regardless of which indexes exist.

In ArangoDB 3.1 the optimizer will try to find the best possible index for each level of the traversal. It will inspect the traversal's filter condition and for each level pick the index for which it estimates the lowest cost. If there are no user-defined indexes, it will still use the edge index to find connected vertices. Other indexes will be used if there are filter conditions on edge attributes which are also indexed and the index has a better estimated average selectivity than the edge index.

In 3.1.0 the explain output will always show "Indexes used: none" for traversals, even though a traversal will always use an index. The index display is just missing in the explain output. This has been fixed in ArangoDB 3.1.1, which will show the individual indexes selected by the optimizer for each level of the traversal.

For example, the following query shows the following explain output in 3.1:

Query string:
 FOR v, e, p in 0..3 ANY 'v/test0' e 
   FILTER p.edges[0].type == 1 && p.edges[2].type == 2 
   RETURN p.vertices 

Execution plan:
 Id   NodeType          Est.   Comment
  1   SingletonNode        1   * ROOT
  2   TraversalNode     8000     - FOR v  /* vertex */, p  /* paths */ IN 0..3  /* min..maxPathDepth */ ANY 'v/test0' /* startnode */  e
  3   CalculationNode   8000     - LET #5 = ((p.`edges`[0].`type` == 1) && (p.`edges`[2].`type` == 2))   /* simple expression */
  4   FilterNode        8000     - FILTER #5
  5   CalculationNode   8000     - LET #7 = p.`vertices`   /* attribute expression */
  6   ReturnNode        8000     - RETURN #7

Indexes used:
 By   Type   Collection   Unique   Sparse   Selectivity   Fields                Ranges
  2   edge   e            false    false        10.00 %   [ `_from`, `_to` ]    base INBOUND
  2   edge   e            false    false        10.00 %   [ `_from`, `_to` ]    base OUTBOUND
  2   hash   e            false    false        63.60 %   [ `_to`, `type` ]     level 0 INBOUND
  2   hash   e            false    false        64.40 %   [ `_from`, `type` ]   level 0 OUTBOUND
  2   hash   e            false    false        63.60 %   [ `_to`, `type` ]     level 2 INBOUND
  2   hash   e            false    false        64.40 %   [ `_from`, `type` ]   level 2 OUTBOUND

Additional indexes are present on [ "_to", "type" ] and [ "_from", "type" ]. Those are used on levels 0 and 2 of the traversal because there are filter conditions for the edges on these levels that can use these indexes. For all other levels, the traversal will use the indexes labeled with "base" in the "Ranges" column.

The explain output fix will become available with 3.1.1, which will be released soon.



标签: arangodb