How to have Solr autocomplete on whole phrase when

2019-03-08 04:26发布

问题:

I've looked through a ton of examples and other questions here and from them, I've got my config very close to what I need but I'm missing one last little bit that I'm having a heck of a time working out. I'm searching on values like:

solar powered
solar glass
solar globe
solar lights
solar magic
solid brass
solid copper

What I want:

  1. If I search for sol the result should include all these values. This works.
  2. If I search for solar I should get just the first five. This works.
  3. If I search for solar gl I should get only solar glass and solar globe. This does not work. Instead, I get one set of matches for solar and a second set of matches for gl.

In a nutshell, I want to consider the input string as a whole, regardless of any whitespace. I gather this is accomplished by creating a separate query (versus index) analyzer, but I've not been able to make it work. Can anyone suggest a configuration that will get me what I'm looking for?

I've (unsuccessfully) tried:

  • Querying with "solar gl"
  • Querying with mm=100%
  • Defining separate query and index analyzers both using KeywordTokenizerFactory. (Dunno what the heck I thought that would do.)
  • Defining an index analyzer but not a query analyzer.
  • Defining a query analyzer with no tokenizer.

Here's my current schema:

<field name="suggest_phrase" type="suggest_phrase"
    indexed="true" stored="false" multiValued="false" />

And the field definition:

<fieldType name="suggest_phrase" class="solr.TextField" positionIncrementGap="100">
    <analyzer>
        <tokenizer class="solr.KeywordTokenizerFactory" />
        <filter class="solr.LowerCaseFilterFactory" />
    </analyzer>
</fieldType>

And the config:

<searchComponent name="suggest_phrase" class="solr.SpellCheckComponent">
    <lst name="spellchecker">
        <str name="name">suggest_phrase</str>
        <str name="classname">org.apache.solr.spelling.suggest.Suggester</str>
        <str name="lookupImpl">org.apache.solr.spelling.suggest.fst.FSTLookup</str>
        <str name="field">suggest_phrase</str>
        <str name="buildOnCommit">true</str>
    </lst>
</searchComponent>
<requestHandler class="org.apache.solr.handler.component.SearchHandler" name="/suggest_phrase">
    <lst name="defaults">
        <str name="spellcheck">true</str>
        <str name="spellcheck.dictionary">suggest_phrase</str>
        <str name="spellcheck.onlyMorePopular">true</str>
        <str name="spellcheck.count">10</str>
        <str name="spellcheck.collate">false</str>
    </lst>
    <arr name="components">
        <str>suggest_phrase</str>
    </arr>
</requestHandler>

回答1:

Found the answer, finally! I knew I was really close. Turns out my configuration above was correct and I simply needed to change my query.

  1. Use KeywordTokenizerFactory so that the strings get indexed as a whole.
  2. Use SpellCheckComponent for the request handler.
  3. The piece I was missing -- don't query with q=<string> but with spellcheck.q=<string>.

Given the source strings noted above and a query of spellcheck.q=solar+gl this yields the desired results:

solar glass
solar globe


回答2:

I've tried this many times and I came to the conclusion that is not possible out of the box. I found a workaround for that:

I indexed the data adding sopecial chars between each word so that they would not be tokenized. For example:

solarzzzzzzpowered
solarzzzzzzglass
solarzzzzzzglobe

then when you compose your query you make sure you add the same amount of chars between the two words you type, for example solr gl become solarzzzzzzgl.

This will achieve the behavious that you are asking.

Another option would be not to use the autosuggestion field and make a custom field for yourself, but then you will have to manage the wildcard search and all the indexation by yourself and is not too convenient in terms of time and performance.



回答3:

You may use the AnalyzingInfixLookupFactory or FreeTextLookupFactory

  • AnalyzingInfixLookupFactory returns the entire content of the field.
  • FreeTextLookupFactory returns a defined number of tokens.

More details and other suggester algorithms you will find here: http://alexbenedetti.blogspot.de/2015/07/solr-you-complete-me.html

Solr Configuration

<lst name="suggester">
  <str name="name">AnalyzingInfixSuggester</str>
  <str name="lookupImpl">AnalyzingInfixLookupFactory</str> 
  <str name="dictionaryImpl">DocumentDictionaryFactory</str>
  <str name="field">title</str>
  <str name="weightField">price</str>
  <str name="suggestAnalyzerFieldType">text_en</str>
</lst>

<lst name="suggester">
  <str name="name">FreeTextSuggester</str>
  <str name="lookupImpl">FreeTextLookupFactory</str> 
  <str name="dictionaryImpl">DocumentDictionaryFactory</str>
  <str name="field">title</str>
  <str name="ngrams">3</str>
  <str name="separator"> </str>
  <str name="suggestFreeTextAnalyzerFieldType">text_general</str>
</lst>


标签: solr