How to sort SOLR spellCheck suggestions NOT by fre

2019-03-06 00:05发布

问题:

If you search for ahve on my staging index you get the as the first spellcheck correction because the appears more than have in the index (I have 500 documents indexed).
If you search for ahve on my local index you get have as the first spellcheck correction because have appears more than any other word in the index. (I have 21 documents indexed).
This is a simple dumb returned from my staging index

<lst name="ahve">
<int name="numFound">5</int>
<int name="startOffset">0</int>
<int name="endOffset">4</int>
<int name="origFreq">0</int>
<arr name="suggestion">
<lst>
<str name="word">the</str>
<int name="freq">112</int>
</lst>
<lst>
<str name="word">are</str>
<int name="freq">67</int>
</lst>
<lst>
<str name="word">have</str>
<int name="freq">44</int>
</lst>
<lst>
<str name="word">acne</str>
<int name="freq">10</int>
</lst>
<lst>
<str name="word">ache</str>
<int name="freq">3</int>
</lst>
</arr>
</lst>

And adding spellcheck.onlyMorePopular=true or spellcheck.onlyMorePopular=false does NOT change anything.
Is there a way not to sort the returned suggestions by frequency of appearance?

回答1:

By default, spellcheck results are returned based on the Levenshtein string distance formula and then frequency, or the frequency and then score.

You can specify your own sorting method by writing a custom comparator that implements Comparator. Then, provide the name of that method to the field comparatorClass in your solrconfig.xml.

<lst name="spellchecker">
  <str name="name">freq</str>
  <str name="field">lowerfilt</str>
  <str name="spellcheckIndexDir">spellcheckerFreq</str>
  <!-- comparatorClass be one of:
     1. score (default)
     2. freq (Frequency first, then score)
     3. A fully qualified class name
   -->
  <str name="comparatorClass">my.custom.ComparatorClass</str>
  <str name="buildOnCommit">true</str>
</lst>

A couple more suggestions:

  • The field spellcheck.onlyMorePopular doesn't affect sort ordering. This field checks the query results for each suggestion, and displays only the suggestions with the most query results, even if the correct suggestion exists. Use with caution.

  • Make sure to remove stopwords such as 'the', 'that', etc, by passing in your data through the StopFilterFactory on both the index and query side of your requestHandler.

See: http://wiki.apache.org/solr/SpellCheckComponent for more information.