I recently implemented a SpellChecker using Apache Lucene. My code is provided below:
public void loadDictionary() {
try {
File dir = new File("c:/spellchecker/");
Directory directory = FSDirectory.open(dir);
spellChecker = new SpellChecker(directory);
Dictionary dictionary = new PlainTextDictionary(new File("c:/dictionary.txt"));
IndexWriterConfig config = new IndexWriterConfig(Version.LUCENE_40, null);
spellChecker.indexDictionary(dictionary, config, false);
} catch (IOException e) {
e.printStackTrace();
}
}
public String performSpellCheck(String word) {
try {
String[] suggestions = spellChecker.suggestSimilar(word, 1);
if (suggestions.length > 0) {
return suggestions[0];
}
else {
return word;
}
} catch (Exception e) {
return "Error";
}
}
The above code uses a dictionary of English words. I am having a problem with the accuracy. What I want it to do is suggest similar words to words that are spelled incorrectly (that is, words that do not appear in the dictionary being used). However, if I send the word "post" to the performSpellCheck method, it returns "poet", that is, it is correcting words that do not need to be corrected (these words exist in the dictionary file).
Any suggestions on how I can improve my results?
I think, you should use SpellChecker.exists() method. Use suggestSimilar method only if the word does not exists in the dictionary.