How to ignore accent search in Solr

2020-06-18 09:51发布

问题:

I am using solr as a search engine. I have a case where a text field contains accent text like "María". When user search with "María", it is giving resut. But when user search with "Maria" it is not giving any result.

My schema definition looks like below:

<fieldtype name="my_text" class="solr.TextField">
       <analyzer type="Index">
           <tokenizer class="solr.WhitespaceTokenizerFactory"/>
           <filter class="solr.LowerCaseFilterFactory"/>
           <filter class="solr.EdgeNGramFilterFactory" minGramSize="1" maxGramSize="32" side="front"/>
       </analyzer>
       <analyzer type="query">
           <tokenizer class="solr.WhitespaceTokenizerFactory"/>
           <filter class="solr.LowerCaseFilterFactory"/>

       </analyzer>
</fieldtype>

Please help to solve this issue.

回答1:

If you're on solr > 3.x you can try using solr.ASCIIFoldingFilterFactory which will change all the accented characters to their unaccented versions from the basic ascii 127-character set.

Remember to put it after any stemming filter you have configured (you're not using one, so you should be ok).

So your config could look like:

<fieldtype name="my_text" class="solr.TextField">
       <analyzer type="Index">
           <tokenizer class="solr.WhitespaceTokenizerFactory"/>
           <filter class="solr.LowerCaseFilterFactory"/>
           <filter class="solr.ASCIIFoldingFilterFactory"/>
           <filter class="solr.EdgeNGramFilterFactory" minGramSize="1" maxGramSize="32" side="front"/>
       </analyzer>
       <analyzer type="query">
           <tokenizer class="solr.WhitespaceTokenizerFactory"/>
           <filter class="solr.LowerCaseFilterFactory"/>
           <filter class="solr.ASCIIFoldingFilterFactory"/>

       </analyzer>
</fieldtype>


回答2:

Answering here because it's the first result that pop when searching "ignore accents solr".

In the schema.xml generated by haystack (and using aldryn_search, djangocms & djangocms-blog), the answer provided by @soulcheck works if you add the <filter class="solr.ASCIIFoldingFilterFactory"/> line in the text_en fieldType.

Screenshot 1, screenshot 2.



标签: solr