Solr case insensitve

2019-08-07 07:04发布

问题:

Hallo,

I'am implementing an autocompletion feature in Solr and have one problem.

For autocompletion I am using

<fieldType name="text_auto" class="solr.TextField" sortMissingLast="true" omitNorms="true">
    <analyzer>
        <tokenizer class="solr.KeywordTokenizerFactory"/>  
        <filter class="solr.LowerCaseFilterFactory" />
    </analyzer>
</fieldType> 

I thought that the LowerCaseFilter should make the Token Case insensitiv but that ist wrong. In fact in just lowercases the Token which means that a query like "comput" would lead to "computer" while "Comput" doesn't. Actually I want comput and Comput to lead to Computer.

I allready tried this:

<fieldType name="text_auto_low" class="solr.TextField" sortMissingLast="true" omitNorms="true">
    <analyzer>
        <tokenizer class="solr.KeywordTokenizerFactory"/>  
        <filter class="solr.LowerCaseFilterFactory" />
    </analyzer>
</fieldType> 

<fieldType name="text_auto_up" class="solr.TextField" sortMissingLast="true" omitNorms="true">
    <analyzer>
        <tokenizer class="solr.KeywordTokenizerFactory"/>  
    </analyzer>
</fieldType>

For some reason it doesn't word either. My question is why and haw can I fix this?

回答1:

Lucene has the Analyser class which you can use(implement) in three ways:

  • SimpleAnalyzer : This converts all of the input to lower case.
  • StopAnalyzer : This removes words that removes noise from your search.
  • StandardAnalyzer : This does both the above filter processes and thus can 'clean up' your query.

Now, coming to your question, i would recommend a techinque called ngram that splits up your query and then searches for those phrases instead. Thus, you can still get excellent results even if there are typos.

To know how to do this, i suggest you to read this to get you started. It also has other great info regarding queries. This not only will solve your problem, but will enhance your app.

Have fun :D