Well I am making a algorithm before which I have to understand how solr handle results when it has to do AND between them.
So Consider a scenario
id Country City
1 India Bangalore
2 America New York
3 France Paris
4 America Los Angeles
Now suppose my query is country = America and city = Los Angeles ..
Now Will solr work like this ??
Take all Ids for country = America i.e Id (2 , 4)
Then take all Ids for City = Los Angeles i.e (4)
Then Find common in both result set i.e (4).
If it is the way to resolve AND , then isn't have high complexity.
Even too much high if we have more ANDs .
Can any one tell clear my doubts.
EDIT : To show usecase which clearly depict my requirements.
Id(unique) returnMe desc name value
1 user1 all those living in usa country USA
2. user2 all those like game game football
3. user1 my hobbies are hobby guitar
Now how can I get returnMe for following queries ??
1. For all those users who live in usa AND hobby is guitar.
2. For all those users who live in usa OR game is football.
Answer for query first should be user1
Answer for query second should be user1 and user2
Thanks
Solr can do complex boolean operations across millions of docs really fast. Data goes into a reverse index of bitsets. I am no expert on it, but hope this illustration helps:
Documents [1,2,3,4]
country:america : "0101" (in bitset, 0 for absent and 1 for prese)
city:los angeles : "0001"
and so
country:america and city:los angeles => "0101" AND "0001" => "0001"
A 1,000,000 byte bitset can represent 1,000,000 documents (in worst case), and your computer can access it in 19 microseconds from RAM and 2 milliseconds from disk. And CPUs are natural at doing boolean operations fast (the CPUs in our solr servers are hardly busy even at 100s of millions of documents).
And so Solr can do complex boolean operations across millions of docs really fast.
Bitsets might come into it in the case where a Filter is used: results of filters get cached in memory as bitsets for fast lookup.
But in the general case, what happens is that Lucene creates an iterator for each term; in your example there would be an iterator for America and another for Los Angeles. Then Lucene iterates over these, and (in the case of AND) combines them by finding docids that exist in all the iterators. This can be done very efficiently by: (1) iterating first over the iterator with the fewest total number of matches, and (2) skipping over any docids that are < the current matching docid. Because docids are (usually) scored in order, this can be done. In your example, the scorer for the Los Angeles term would be evaluated first because of its lower number of matching docs; the first match is "4". Then the scorer for the America term would be evaluated, and told to skip ahead to "4" - a match is found, and then both iterators terminate.
The summary is: don't worry about this: the performance of this sort of thing is very good w/Lucene and Solr; it's the main reason they have become so widely accepted.