Is my understanding of query processing correct?
- Get DocSet from cache or First filter query will create implementation of OpenBitSet or SortedVIntSet and cache it
- Get DocSet from cache or All other filters create their implementation of DocBitSet and it will be intersected with original (efficiency of this code depends on implementation of first implementation of DocSet)
- We do leapfrog with MainQuery and final DocSet(after all intersections) using Lucene filter+query search(efficiency of this is dependent on first DocSet implementation)
- We apply post filters(cost > 100 && cache==false) as AND of orignal query
So as a consequence performance will be dependent on first filter since for small query SortedIntSet is more efficient and for big BitSet is better. Am I correct?
Second part of question: DocSet has two main implementation - HashDocSet and SortedIntDoc, each intersection implementation iterates over all instances in first filter and check if it is also in second DocSet... That means we have to sort filters by size, smallest first. Is it possible to control order of cached filters(cost only works for non cached filters)?