If you search for ahve on my staging index you get the as the first spellcheck correction because the appears more than have in the index (I have 500 documents indexed).
If you search for ahve on my local index you get have as the first spellcheck correction because have appears more than any other word in the index. (I have 21 documents indexed).
This is a simple dumb returned from my staging index
<lst name="ahve">
<int name="numFound">5</int>
<int name="startOffset">0</int>
<int name="endOffset">4</int>
<int name="origFreq">0</int>
<arr name="suggestion">
<lst>
<str name="word">the</str>
<int name="freq">112</int>
</lst>
<lst>
<str name="word">are</str>
<int name="freq">67</int>
</lst>
<lst>
<str name="word">have</str>
<int name="freq">44</int>
</lst>
<lst>
<str name="word">acne</str>
<int name="freq">10</int>
</lst>
<lst>
<str name="word">ache</str>
<int name="freq">3</int>
</lst>
</arr>
</lst>
And adding spellcheck.onlyMorePopular=true
or spellcheck.onlyMorePopular=false
does NOT change anything.
Is there a way not to sort the returned suggestions by frequency of appearance?
By default, spellcheck results are returned based on the Levenshtein string distance formula and then frequency, or the frequency and then score.
You can specify your own sorting method by writing a custom comparator that implements Comparator
. Then, provide the name of that method to the field comparatorClass
in your solrconfig.xml.
<lst name="spellchecker">
<str name="name">freq</str>
<str name="field">lowerfilt</str>
<str name="spellcheckIndexDir">spellcheckerFreq</str>
<!-- comparatorClass be one of:
1. score (default)
2. freq (Frequency first, then score)
3. A fully qualified class name
-->
<str name="comparatorClass">my.custom.ComparatorClass</str>
<str name="buildOnCommit">true</str>
</lst>
A couple more suggestions:
The field spellcheck.onlyMorePopular
doesn't affect sort ordering. This field checks the query results for each suggestion, and displays only the suggestions with the most query results, even if the correct suggestion exists. Use with caution.
Make sure to remove stopwords such as 'the', 'that', etc, by passing in your data through the StopFilterFactory
on both the index and query side of your requestHandler.
See: http://wiki.apache.org/solr/SpellCheckComponent for more information.