How to search phrase queries in inverted index str

If we want to search a query like this "t1 t2 t3" (t1,t2 ,t3 must be queued) in an inverted index structure , which ways should we do ?

1-First we search the "t1" term and find all documents that contains "t1" , then do this work for "t2" and then "t3" . Then find documents that positions of "t1" , "t2" and "t3" are next to each other .

2-First we search the "t1" term and find all documents that contains "t1" , then in all documents that we found , we search the "t2" and next , in the result of this , we find documents that contains "t3" .

I have a full inverted index . I want to know which ways above is optimized , (1) or (2) ?

thanks a lot.

标签： search phrase inverted-index

1条回答

再贱就再见

2楼-- · 2019-03-30 15:47

As the wikipedia entry well explains,

There are two main variants of inverted indexes: A record level inverted index (or inverted file index or just inverted file) contains a list of references to documents for each word. A word level inverted index (or full inverted index or inverted list) additionally contains the positions of each word within a document. The latter form offers more functionality (like phrase searches), but needs more time and space to be created.

Since you don't tell us which variant you have, we can't really answer your question precisely, but thinking about each possibility will help.

To open and search documents is typically a costly operation, unless your documents are unusually small, so you want to minimize that -- and option (2) doesn't really minimize it. If you have an inverted list, with option (1) you won't even need to open any document; if you only have an inverted file, you'll inevitably need to open documents and scan them (since you otherwise lack information to confirm word adjacency) -- but at least with option (1) you minimize the number of documents you have to open and scan (only those in the intersection of the lists of documents containing each word).

So, in either case, option (1) is more promising (unless your documents are peculiarly small).

0人赞添加讨论(0) 举报

How to search phrase queries in inverted index str

采纳回答

编辑标签

举报内容

检举类型

检举原因

检举说明(必填)

打开微信“扫一扫”，打开网页后点击屏幕右上角分享按钮

付费偷看金额在0.1-10元之间