i have read various threads about how to remove accents during index/query time. The current fieldtype i have come up with looks like the following:
<fieldType name="text_general" class="solr.TextField">
<analyzer>
<tokenizer class="solr.StandardTokenizerFactory"/>
<filter class="solr.ASCIIFoldingFilterFactory"/>
<filter class="solr.LowerCaseFilterFactory" />
</analyzer>
</fieldType>
After having added a couple of test information to index i have checked via http://localhost:8080/solr/test_core/admin/luke?fl=title
which kind of tokens have been generated. For instance a title like "Bayern München" has been tokenized into:
<int name="bayern">1</int>
<int name="m">1</int>
<int name="nchen">1</int>
Therefore instead of replacing the character by its ascii pendant, it has been interpret as being a delimiter?! Having that kind of index results into that i neither can search for "münchen" nor m?nchen.
Any idea how to fix? Thanks in advance.