Lucene's manual has explained the meaning of proximity search for a phrase with two words clearly, such as the "jakarta apache"~10
example in
http://lucene.apache.org/core/2_9_4/queryparsersyntax.html#Proximity Searches
However, I am wondering what does a search like "jakarta apache lucene"~10
exactly do? Does it allow neighboring words to be at most 10 words apart, or all pairs of words to be that?
Thanks!
The slop (proximity) works like an edit distance (see
PhraseQuery.setSlop
). So, the terms could be reordered or have extra terms added. This means that the proximity would be the maximum number of terms added into the whole query. That is:Will match:
But not:
Some people have been confused by:
The simple explanation is that swapping terms takes two edits, so:
The longer, but more accurate, explanation is that every edit allows a term to be moved by one position. The first move of a swap transposes two terms on top of each other. Keeping this in mind explains why any set of three terms can be rearranged into any order with distance no greater than 4.