Lucene.NET - checking if document exists in index

I have the following code, using Lucene.NET V4, to check if a file exists in my index.

bool exists = false;
IndexReader reader = IndexReader.Open(Lucene.Net.Store.FSDirectory.Open(lucenePath), false);
Term term = new Term("filepath", "\\myFile.PDF");
TermDocs docs = reader.TermDocs(term);
if (docs.Next())
{
   exists = true;
}

The file myFile.PDF definitely exists, but it always comes back as false. When I look at docs in debug, its Doc and Freq properties state that they "threw an exception of type 'System.NullReferenceException'.

标签： c# .net lucene lucene.net

2条回答

叛逆

2楼-- · 2019-08-20 02:38

You may have analyzed the field "filepath" during indexing with an analyzer which tokenizes/changes the content. e.g. the StandardAnalyzer tokenizes, lowercases, removes stopwords if specified etc.

If you only need to query with the exact filepath like in your example use the KeywordAnalyzer during indexing for this field.

If you can't re-index at the moment you need to find out which analyzer is used during indexing and use it to create your query. You have two options:

Use a query parser with the right analyzer and parse the query filepath:\\myFile.PDF. If the resultung query is a TermQuery you can use its term as you did in your example. Otherwise perform a search with the query.
Use the Analyzer directly to create the terms from the TokenStream object. Again, if only one term, do it as you did, if multipe terms, create a phrase query.

0人赞添加讨论(0) 举报

老娘就宠你

3楼-- · 2019-08-20 02:41

First of all, it's a good practice to use the same instance of the IndexReader if you're not going to consider deleted documents - it's going to perform better and it's thread-safe so you can make a static read-only field out of it (although, I can see that you're specifying false for readOnly parameter so in case this is intended, just ignore this paragraph).

As for your case, are you tokenizing filepath field values? Because if you are (e.g. by using StandardAnalyzer when indexing/searching), you will probably have problems finding values such as \myFile.PDF (with default tokenizer, the value is going to be split into myFile and PDF, not sure about the leading backslash).

Hope this helps.

0人赞添加讨论(0) 举报

Lucene.NET - checking if document exists in index

采纳回答

编辑标签

举报内容

检举类型

检举原因

检举说明(必填)

打开微信“扫一扫”，打开网页后点击屏幕右上角分享按钮

付费偷看金额在0.1-10元之间