I have a simple graph traversal query:
FOR e in 0..3 ANY 'Node/5025926' Edge
FILTER
e.ModelType == "A.Model" &&
e.TargetType == "A.Target" &&
e.SourceType == "A.Source"
RETURN e
The 'Edge' edge collection has a hash index defined for attributes ModelType, TargetType, SourceType, in that order.
When checking the execution plan, the results are:
Query string:
FOR e in 0..3 ANY 'Node/5025926' Edge
FILTER
e.ModelType == "A.Model" &&
e.TargetType == "A.Target" &&
e.SourceType == "A.Source"
RETURN e
Execution plan:
Id NodeType Est. Comment
1 SingletonNode 1 * ROOT
2 TraversalNode 7 - FOR e /* vertex */ IN 0..3 /* min..maxPathDepth */ ANY 'Node/5025926' /* startnode */ Edge
3 CalculationNode 7 - LET #1 = (((e.`ModelType` == "A.Model") && (e.`TargetType` == "A.Target")) && (e.`SourceType` == "A.Source")) /* simple expression */
4 FilterNode 7 - FILTER #1
5 ReturnNode 7 - RETURN e
Indexes used:
none
Traversals on graphs:
Id Depth Vertex collections Edge collections Filter conditions
2 0..3 Edge
Optimization rules applied:
none
Notice that the execution plan indicates that no indices will be used to process the query.
Is there anything I need to do to make the engine use the index on the Edge collection to process the results?
Thanks
In ArangoDB 3.0 a traversal will always use the edge index to find connected vertices, regardless of which filter conditions are present in the query and regardless of which indexes exist.
In ArangoDB 3.1 the optimizer will try to find the best possible index for each level of the traversal. It will inspect the traversal's filter condition and for each level pick the index for which it estimates the lowest cost. If there are no user-defined indexes, it will still use the edge index to find connected vertices. Other indexes will be used if there are filter conditions on edge attributes which are also indexed and the index has a better estimated average selectivity than the edge index.
In 3.1.0 the explain output will always show "Indexes used: none" for traversals, even though a traversal will always use an index. The index display is just missing in the explain output. This has been fixed in ArangoDB 3.1.1, which will show the individual indexes selected by the optimizer for each level of the traversal.
For example, the following query shows the following explain output in 3.1:
Query string:
FOR v, e, p in 0..3 ANY 'v/test0' e
FILTER p.edges[0].type == 1 && p.edges[2].type == 2
RETURN p.vertices
Execution plan:
Id NodeType Est. Comment
1 SingletonNode 1 * ROOT
2 TraversalNode 8000 - FOR v /* vertex */, p /* paths */ IN 0..3 /* min..maxPathDepth */ ANY 'v/test0' /* startnode */ e
3 CalculationNode 8000 - LET #5 = ((p.`edges`[0].`type` == 1) && (p.`edges`[2].`type` == 2)) /* simple expression */
4 FilterNode 8000 - FILTER #5
5 CalculationNode 8000 - LET #7 = p.`vertices` /* attribute expression */
6 ReturnNode 8000 - RETURN #7
Indexes used:
By Type Collection Unique Sparse Selectivity Fields Ranges
2 edge e false false 10.00 % [ `_from`, `_to` ] base INBOUND
2 edge e false false 10.00 % [ `_from`, `_to` ] base OUTBOUND
2 hash e false false 63.60 % [ `_to`, `type` ] level 0 INBOUND
2 hash e false false 64.40 % [ `_from`, `type` ] level 0 OUTBOUND
2 hash e false false 63.60 % [ `_to`, `type` ] level 2 INBOUND
2 hash e false false 64.40 % [ `_from`, `type` ] level 2 OUTBOUND
Additional indexes are present on [ "_to", "type" ]
and [ "_from", "type" ]
. Those are used on levels 0 and 2 of the traversal because there are filter conditions for the edges on these levels that can use these indexes. For all other levels, the traversal will use the indexes labeled with "base" in the "Ranges" column.
The explain output fix will become available with 3.1.1, which will be released soon.