If we want to search a query like this "t1 t2 t3" (t1,t2 ,t3 must be queued) in an inverted index structure , which ways should we do ?
1-First we search the "t1" term and find all documents that contains "t1" , then do this work for "t2" and then "t3" . Then find documents that positions of "t1" , "t2" and "t3" are next to each other .
2-First we search the "t1" term and find all documents that contains "t1" , then in all documents that we found , we search the "t2" and next , in the result of this , we find documents that contains "t3" .
I have a full inverted index . I want to know which ways above is optimized , (1) or (2) ?
thanks a lot.
As the wikipedia entry well explains,
Since you don't tell us which variant you have, we can't really answer your question precisely, but thinking about each possibility will help.
To open and search documents is typically a costly operation, unless your documents are unusually small, so you want to minimize that -- and option (2) doesn't really minimize it. If you have an inverted list, with option (1) you won't even need to open any document; if you only have an inverted file, you'll inevitably need to open documents and scan them (since you otherwise lack information to confirm word adjacency) -- but at least with option (1) you minimize the number of documents you have to open and scan (only those in the intersection of the lists of documents containing each word).
So, in either case, option (1) is more promising (unless your documents are peculiarly small).