If you have a double compound index { a : 1, b : 1}, it makes sense to me that the index won't be used if you query on b alone (i.e. you cannot "skip" a in your query). The index will however be used if you query on a alone.
However, given a triple compound index { a : 1, b: 1, c: 1} my explain command is showing that the index is used when you query on a and c (i.e. you can "skip" b in your query).
How can Mongo use an abc index on a query for ac, and how effective is the index in this case?
Background:
My use case is that sometimes I want to query on a,b,c and sometimes I want to query on a,c. Now should I create only 1 index on a,b,c or should I create one on a,c and one on a,b,c?
(It doesn't make sense to create an index on a,c,b because c is a multi-key index with good selectivity.)
You can view querying on A and C as a special case of querying on A (in which case the index would be used). Using the index is more efficient than having to load the whole document.
Suppose you wanted to get all documents with A between 7 and 13, and C between 5 and 8.
If you had an index on A only: the database could use the index to select documents with A between 7 and 13 but, to make sure that C was between 5 and 8, it would have to retrieve the corresponding documents too.
If you had an index on A, B, and C: the database could use the index to select documents with A between 7 and 13. Since the values of C are already stored in the records of the index, it could determine whether the correponding documents also match the C criterion, without having to retrieve those documents. Therefore, you would avoid disk reads, with better performance.
bottom line / tl;dr: Index
b
can be 'skipped' ifa
andc
are queried for equality or inequality, but not, for instance, for sorts onc
.This is a very good question. Unfortunately, I couldn't find anything that authoritatively answers this in greater detail. I believe the performance of such queries has improved over the last years, so I wouldn't trust old material on the topic.
The whole thing is quite complicated because it depends on the selectivity on your indexes and whether you query for equality, inequality and/or sort, so
explain()
is your only friend, but here are some things I found:Caveat: What comes now is a mixture of experimental results, reasoning and guessing. I might be stretching Kyle's analogy too far, and I might even be completely wrong (and unlucky, because my test results loosely match my reasoning).
It is clear that the index of A can be used, which, depending on the selectivity of A, is certainly very helpful. 'Skipping' B can be tricky, or not. Let's keep this similar to Kyle's cookbook example:
If you now ask me to find some French dish called "Chateaubriand", I can use index
A
and, because I don't know the ingredient, will have to scan all dishes inA
. On the other hand, I do know that the list of dishes in each category is sorted through the indexC
, so I will only have to look for the strings starting with, say, "Cha" in each ingredient-list. If there are 50 ingredients, I will need 50 lookups instead of just one, but that is a lot better than having to scan every French dish!If you asked me to give you an alphabetically sorted list of all French dishes, though, I'd be in trouble. Now the index on
C
is worthless, I'd have to merge-sort all those index lists. I will have to scan every element to do so.This reflects in my tests. Here are some simplified results. The original collection has datetimes, ints and strings, but I wanted to keep things simple, so it's now all ints.
Essentially, there are only two classes of queries: those where
nscanned
<= 2 *limit
, and those that have to scan the entire collection (120k documents). The index is{a, b, c}
:Your mileage will vary.