While reading "Lucene in Action 2nd edition" I came across the description of Filter
classes which are could be used for result filtering in Lucene. Lucene has a lot of filters repeating Query
classes. For example, NumericRangeQuery
and NumericRangeFilter
.
The book says that NRF
does exactly the same as NRQ
but without document scoring. Does this means that if I do not need scoring or sort documents by document field value I should prefer Filter
ing over Query
ing from performance point of view?
I receive a great answer from Uwe Schindler, let me repost it here.
In contrast to Dennis' answer: no, you probably don't want to use a filter unless you're going to reuse the same query multiple times.
A
NumericRangeFilter
is just a subclass ofMultiTermQueryWrapperFilter
, which means that essentially it does something like this:So it will run in linear time over your index instead of logarithmic time like a normal query.
Additionally, the filter will take up more memory (one bit for every doc in your index).
If you're going to be using the same query over and over again, then it's probably worth it to you to pay the performance/memory hit once and have later usages be faster. But if it's a one-off query, it's almost certainly not worth it.
(Also, if you're going to reuse it, use a
CachingWrapperFilter
so that the filter is cached.)I found this in http://wiki.apache.org/lucene-java/ImproveSearchingSpeed which seems to suggest to use filters rather than queries. Intuitively it makes more sense to me as they pretty much should do the same thing, the only difference being that filters are not used in the score.
If the filter will be reused it is wise to use this instead of queries because of caching purposes. If you are not going to be using the scoring or field values it also makes sense to use filter over query.
Hope this helps.