Analyzer for Russian language in Lucene and Lucene

2019-02-03 10:24发布

Lucene has quite poor support for Russian language.

RussianAnalyzer (part of lucene-contrib) is of very low quality.

RussianStemmer module for Snowball is even worse. It does not recognize Russian text in Unicode strings, apparently assuming that some bizarre mix of Unicode and KOI8-R must be used instead.

Do you know any better solutions?

标签: java .net lucene
5条回答
家丑人穷心不美
2楼-- · 2019-02-03 10:57

Project http://code.google.com/p/russianmorphology/ moved to https://github.com/AKuznetsov/russianmorphology. Please take into account the new hosting of this project.

查看更多
甜甜的少女心
3楼-- · 2019-02-03 10:58

That's the beauty of open source. You have the source code, so if the current implementations don't work for you, you can always create your own or even better, extend the existing ones. A good start would be the "Lucene in Action" book.

查看更多
该账号已被封号
4楼-- · 2019-02-03 11:10

If all else fails, use Sphinx

查看更多
相关推荐>>
5楼-- · 2019-02-03 11:12

My answer is probably too late, but for the record, I've found analyzers from AOT project much better then those shipped with Lucene.

查看更多
登录 后发表回答