Apache Lucene - Improving the results of Spell Che

2019-07-12 19:22发布

I recently implemented a SpellChecker using Apache Lucene. My code is provided below:

public void loadDictionary() {
    try {
        File dir = new File("c:/spellchecker/");
        Directory directory = FSDirectory.open(dir);
        spellChecker = new SpellChecker(directory);
        Dictionary dictionary = new PlainTextDictionary(new File("c:/dictionary.txt"));
        IndexWriterConfig config = new IndexWriterConfig(Version.LUCENE_40, null);
        spellChecker.indexDictionary(dictionary, config, false);
    } catch (IOException e) {
        e.printStackTrace();
    }
}

public String performSpellCheck(String word) {
    try {
         String[] suggestions = spellChecker.suggestSimilar(word, 1);
         if (suggestions.length > 0) {
             return suggestions[0];
         }
         else {
             return word; 
         }
    } catch (Exception e) {
        return "Error";
    }
}

The above code uses a dictionary of English words. I am having a problem with the accuracy. What I want it to do is suggest similar words to words that are spelled incorrectly (that is, words that do not appear in the dictionary being used). However, if I send the word "post" to the performSpellCheck method, it returns "poet", that is, it is correcting words that do not need to be corrected (these words exist in the dictionary file).

Any suggestions on how I can improve my results?

标签： java performance apache lucene spell-checking

1条回答

萌系小妹纸

2楼-- · 2019-07-12 19:26

I think, you should use SpellChecker.exists() method. Use suggestSimilar method only if the word does not exists in the dictionary.

0人赞添加讨论(0) 举报

Apache Lucene - Improving the results of Spell Che

采纳回答

编辑标签

举报内容

检举类型

检举原因

检举说明(必填)

打开微信“扫一扫”，打开网页后点击屏幕右上角分享按钮

付费偷看金额在0.1-10元之间