Storing words with apostrophe in Lucene index

I've a company field in Lucene Index. One of the company names indexed is : Moody's

When user types in any of the following keywords,I want this company to come up in search results. 1.Moo 2.Mood 3.Moodys 4.Moody's

How should I store this index in Lucene and what type of Lucene Query should I use to get this behaviour?

Thanks.

标签： lucene lucene.net

2条回答

我欲成王，谁敢阻挡

2楼-- · 2019-04-08 14:33

The StandardAnalyser should work for 3 and 4, however won't work for 1 and 2.

Without writing your own (complex) text analyser, I would think about how you're expecting company names to be searched for. For example, basic lucene search syntax means that you could find "Moody's" if you search using wildcards: "Moo*" and "Mood*". Therefore, you might want to consider appending an "*" to the search term before submitting to lucene, however this might cause some confusion if the user isn't aware of this wildcard addition under the hood.

0人赞添加讨论(0) 举报

贼婆χ

3楼-- · 2019-04-08 14:59

Based on your clarifications, I want to divide your question into two, and answer each in turn:

How do I index words with apostrophes as equivalent to similar words without an apostrophe? e.g. mapping Moodys and Moody's to the same index term.
How do I implement auto-complete search in Lucene - i.e. given an index, find documents using word prefixes, e.g. map Moo to Moodys ?

1 is relatively easy - Use a StandardToeknizer to create a token combining the apostrophe and s with the previous word, then a StandardFilter to remove the apostrophe and s. This will convert Moody's to Moody. A StandardAnalyzer does this and much more (lowercasing and stop word removal), which may be more than you need. Using a stemmer should take both Moodys and Moody to the same token. Try SnowBallFilter for this.

2 is harder: Lucene's PrefixQuery, to which Alan alluded, will only work when the company name is the first word in a field. You need something like the answer to this question about auto-complete in Lucene.

0人赞添加讨论(0) 举报

Storing words with apostrophe in Lucene index

采纳回答

编辑标签

举报内容

检举类型

检举原因

检举说明(必填)

打开微信“扫一扫”，打开网页后点击屏幕右上角分享按钮

付费偷看金额在0.1-10元之间