Having a document of a following structure:
{
path: String,
enabled: Long,
disabled: null || Long,
// other fields...
}
I would like to look the documents up by a combination of path's prefix and numerical relation between some number and document's timestamps: (pseudocode)
SELECT e FROM entries
WHERE
e.path STARTS WITH "somePrefix"
AND e.enabled <= timestamp
AND (
e.disabled == null
OR timestamp < e.disabled
)
What index structure will I benefit from the most, if any? Should I have a non-sparse skiplist index on enabled
field + a sparse one on disabled
and a fulltext non-sparse one on the path
? Is ArangoDB capable of utilising multiple indexes for these types of queries? I read the doc page on index usage, but I'm still unclear.
ArangoDB can use multiple indexes on the same collection if the filter conditions are combined with logical or, and the indexes satisfy the or branches conditions.
In the case of your query, you have three conditions combined with logical and, with the latter containing an or.
There is no STARTS WITH
predicate in AQL, but instead you could use a range query constructed with the prefix bounds: e.path >= @lower && e.path < @upper
. For a search value of "somePrefix"
, the bounds would translate to @upper
being "somePrefix", and
@lowerbeing
"somePrefiy"` (search value with last character increased by one).
Creating a skiplist index on path
will make the query use that index.
Including the search condition for enabled
, the combined condition up to now is e.path >= @lower && e.path < @upper && e.enabled <= @timestamp
. Though a skiplist index can be created on multiple attributes, it won't be used here on both path
and enabled
, but only on path
. Reversing the order of index attributes (i.e. enabled
first, then path
) won't help either, because then the index would be used on enabled
only, but not on path
.
In general, the skiplist index will be used for the parts of the condition that can produce a contiguous range. That is the case if the left-most index attributes are used in equality comparisons (e.g. e.path == @path && e.enabled <= @timestamp
would work), but if its left-most index attributes are non-equality comparisons (e.g. e.path >= @lower && e.path <= @upper
or @e.enabled <= @timestamp
), then it won't look at its further index attributes, because it would produce a non-contiguous range anyway.
There's also the option of creating a skiplist index on disabled
. That allows the optimizer to use that index on the part e.enabled <= @timestamp && (e.disabled == null || @timestamp < e.disabled)
. It can transform this to e.disabled == null || @timestamp < e.disabled
, which however does not look very selective.
In summary: there does not seem to be a good index choice for that particular query. If you could somehow change the STARTS WITH
to an equality comparison, then you could create a combined skiplist index on path
and enabled
, and it would potentially be rather selective.
If your STARTS WITH
prefixes are always the same size, it might be worth to save the prefix in an extra attribute, that can be indexed instead of the original value and be queried using an equality comparison: e.pathPrefix == @prefix && e.enabled <= @timestamp
.
That requires saving and maintaining an extra prefix attribute per document, but maybe its worth it when enabling the use of a much more selective index.