Lucene search and underscores

2019-01-25 11:11发布

When I use Luke to search my Lucene index using a standard analyzer, I can see the field I am searchng for contains values of the form MY_VALUE. When I search for field:"MY_VALUE" however, the query is parsed as field:"my value"

Is there a simple way to escape the underscore (_) character so that it will search for it?

EDIT:

4/1/2010 11:08AM PST

I think there is a bug in the tokenizer for Lucene 2.9.1 and it was probably there before. Load up Luke and try to search for "BB_HHH_FFFF5_SSSS", when there is a number, the following tokens are returned:

"bb hhh_ffff5_ssss"

After some testing, I've found that this is because of the number. If I input

"BB_HHH_FFFF_SSSS", I get

"bb hhh ffff ssss"

At this point, I'm leaning towards a tokenizer bug unless the presence of the number is supposed to have this behavior but I fail to see why.

Can anyone confirm this?

标签： lucene lucene.net underscores

2条回答

走好不送

2楼-- · 2019-01-25 11:50

I don't think you'll be able to use the standard analyser for this use case.

Judging what I think your requirements are, the keyword analyser should work fine for little effort (the whole field becomes a single term).

I think some of the confusion arises when looking at the field with luke. The stored value is not what's used by queries, what you need are the terms. I suspect that when you look at the terms stored for your field, they'll be "my" and "value".

Hope this helps,

0人赞添加讨论(0) 举报

Bombasti

3楼-- · 2019-01-25 12:00

It doesn't look like you used the StandardAnalyzer to index that field. In Luke you'll need to select the analyzer that you used to index that field in order to match MY_VALUE correctly.

Incidentally, you might be able to match MY_VALUE by using the KeywordAnalyzer.

0人赞添加讨论(0) 举报

Lucene search and underscores

采纳回答

编辑标签

举报内容

检举类型

检举原因

检举说明(必填)

打开微信“扫一扫”，打开网页后点击屏幕右上角分享按钮

付费偷看金额在0.1-10元之间