On multiple index usage in ArangoDB

2019-02-26 20:35发布

问题:

Having a document of a following structure:

{
  path: String,
  enabled: Long,
  disabled: null || Long,
  // other fields...
}

I would like to look the documents up by a combination of path's prefix and numerical relation between some number and document's timestamps: (pseudocode)

SELECT e FROM entries 
WHERE 
  e.path STARTS WITH "somePrefix" 
  AND e.enabled <= timestamp 
  AND (
    e.disabled == null 
    OR timestamp < e.disabled
  )

What index structure will I benefit from the most, if any? Should I have a non-sparse skiplist index on enabled field + a sparse one on disabled and a fulltext non-sparse one on the path? Is ArangoDB capable of utilising multiple indexes for these types of queries? I read the doc page on index usage, but I'm still unclear.

回答1:

ArangoDB can use multiple indexes on the same collection if the filter conditions are combined with logical or, and the indexes satisfy the or branches conditions.

In the case of your query, you have three conditions combined with logical and, with the latter containing an or.

There is no STARTS WITH predicate in AQL, but instead you could use a range query constructed with the prefix bounds: e.path >= @lower && e.path < @upper. For a search value of "somePrefix", the bounds would translate to @upper being "somePrefix", and@lowerbeing"somePrefiy"` (search value with last character increased by one).

Creating a skiplist index on path will make the query use that index.

Including the search condition for enabled, the combined condition up to now is e.path >= @lower && e.path < @upper && e.enabled <= @timestamp. Though a skiplist index can be created on multiple attributes, it won't be used here on both path and enabled, but only on path. Reversing the order of index attributes (i.e. enabled first, then path) won't help either, because then the index would be used on enabled only, but not on path.

In general, the skiplist index will be used for the parts of the condition that can produce a contiguous range. That is the case if the left-most index attributes are used in equality comparisons (e.g. e.path == @path && e.enabled <= @timestamp would work), but if its left-most index attributes are non-equality comparisons (e.g. e.path >= @lower && e.path <= @upper or @e.enabled <= @timestamp), then it won't look at its further index attributes, because it would produce a non-contiguous range anyway.

There's also the option of creating a skiplist index on disabled. That allows the optimizer to use that index on the part e.enabled <= @timestamp && (e.disabled == null || @timestamp < e.disabled). It can transform this to e.disabled == null || @timestamp < e.disabled, which however does not look very selective.

In summary: there does not seem to be a good index choice for that particular query. If you could somehow change the STARTS WITH to an equality comparison, then you could create a combined skiplist index on path and enabled, and it would potentially be rather selective. If your STARTS WITH prefixes are always the same size, it might be worth to save the prefix in an extra attribute, that can be indexed instead of the original value and be queried using an equality comparison: e.pathPrefix == @prefix && e.enabled <= @timestamp. That requires saving and maintaining an extra prefix attribute per document, but maybe its worth it when enabling the use of a much more selective index.



标签: arangodb