可以将文章内容翻译成中文,广告屏蔽插件可能会导致该功能失效(如失效，请关闭广告屏蔽插件后再试):

问题:

Well I am making a algorithm before which I have to understand how solr handle results when it has to do AND between them.

    So Consider a scenario 

    id      Country    City                    
     1     India       Bangalore
     2     America     New York
     3     France      Paris
     4     America     Los Angeles

Now suppose my query is country = America and city = Los Angeles .. Now Will solr work like this ??

Take all Ids for country = America i.e Id (2 , 4)
Then take all Ids for City = Los Angeles i.e (4)
Then Find common in both result set i.e (4).

If it is the way to resolve AND , then isn't have high complexity. Even too much high if we have more ANDs .

Can any one tell clear my doubts.

EDIT : To show usecase which clearly depict my requirements.

 Id(unique)     returnMe             desc                       name         value
1              user1            all those living in usa        country         USA
2.             user2            all those like game            game            football
3.             user1            my hobbies are                 hobby           guitar

Now how can I get returnMe for following queries ??

 1. For all those users who live in usa AND hobby is  guitar.
 2. For all those users who live in usa OR game is football.

Answer for query first should be user1
Answer for query second should be user1 and user2

Thanks

回答1:

Solr can do complex boolean operations across millions of docs really fast. Data goes into a reverse index of bitsets. I am no expert on it, but hope this illustration helps:

Documents [1,2,3,4]
country:america : "0101" (in bitset, 0 for absent and 1 for prese)
city:los angeles : "0001"

and so

country:america and city:los angeles => "0101" AND "0001" => "0001"

A 1,000,000 byte bitset can represent 1,000,000 documents (in worst case), and your computer can access it in 19 microseconds from RAM and 2 milliseconds from disk. And CPUs are natural at doing boolean operations fast (the CPUs in our solr servers are hardly busy even at 100s of millions of documents).

And so Solr can do complex boolean operations across millions of docs really fast.

回答2:

Bitsets might come into it in the case where a Filter is used: results of filters get cached in memory as bitsets for fast lookup.

But in the general case, what happens is that Lucene creates an iterator for each term; in your example there would be an iterator for America and another for Los Angeles. Then Lucene iterates over these, and (in the case of AND) combines them by finding docids that exist in all the iterators. This can be done very efficiently by: (1) iterating first over the iterator with the fewest total number of matches, and (2) skipping over any docids that are < the current matching docid. Because docids are (usually) scored in order, this can be done. In your example, the scorer for the Los Angeles term would be evaluated first because of its lower number of matching docs; the first match is "4". Then the scorer for the America term would be evaluated, and told to skip ahead to "4" - a match is found, and then both iterators terminate.

The summary is: don't worry about this: the performance of this sort of thing is very good w/Lucene and Solr; it's the main reason they have become so widely accepted.