Solr How to search ñ and Ñ with normal char N and

2019-04-14 14:37发布

问题:

How can we map non ASCII char with ASCII character?

Ex.: In solr index we have word contain char ñ, Ñ [LATIN CAPITAL LETTER N WITH TILDE] or normal n,N Then what filter/token we use to search with Normal N or Ñ and both mapped.

回答1:

Merging the answers of Solr, Special Chars, and Latin to Cyrilic char conversion

  1. Take a look at Solr's Analyzers, Tokenizers, and Token Filters which give you a good intro to the type of manipulation you're looking for.
  2. Probably the ASCIIFoldingFilterFactory does exactly what you want.

When changing an analyzer to remove the accents, keep in mind that you need to reindex. Otherwise the accented characters will stay within the index, but no user input can be created to match them.

Update

I tried using the ICUFoldingFilterFactory this works fine with those accents. If this one is tricky to set up, have a look into the SO question Can not use ICUTokenizerFactory in Solr

This analyzer

<fieldType name="spanish" class="solr.TextField">
    <analyzer>
        <tokenizer class="solr.StandardTokenizerFactory" />
        <filter class="solr.ICUFoldingFilterFactory" />
    </analyzer>
</fieldType>

got me these analysis results, the screen-shot is taken from solr-admin