I want to use the "autocomplete" for a search engine on my site.
So, I have a field called shortdesc with the following definition:
<field name="shortdesc" type="text_de" indexed="true" stored="false" />
The field type:
<fieldType name="text_de" class="solr.TextField" positionIncrementGap="100">
<analyzer type="index">
<tokenizer class="solr.StandardTokenizerFactory"/>
<filter class="solr.LengthFilterFactory" min="3" max="20"/>
<filter class="solr.LowerCaseFilterFactory"/>
<filter class="solr.StopFilterFactory" ignoreCase="true" words="lang/stopwords_de.txt" format="snowball" enablePositionIncrements="true"/>
<filter class="solr.GermanNormalizationFilterFactory"/>
<filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
</analyzer>
<analyzer type="query">
<tokenizer class="solr.StandardTokenizerFactory"/>
<filter class="solr.LowerCaseFilterFactory"/>
<filter class="solr.StopFilterFactory" ignoreCase="true" words="lang/stopwords_de.txt" format="snowball" enablePositionIncrements="true"/>
<filter class="solr.GermanNormalizationFilterFactory"/>
</analyzer>
</fieldType>
So, now for do the autocomplete, I need an extra field (field_autocomplete) where Im gonna copy the field shortdesc. This field is defined as (I don't need to retrieve data from this field):
<field name="field_autocomplete" type="text_autocomplete" indexed="true" stored="false" multiValued="true" />
And the type definition:
<fieldType name="text_autocomplete" class="solr.TextField" positionIncrementGap="100">
<analyzer type="index">
<tokenizer class="solr.StandardTokenizerFactory"/>
<filter class="solr.LowerCaseFilterFactory"/>
<filter class="solr.StopFilterFactory" ignoreCase="true" words="lang/stopwords_de.txt" enablePositionIncrements="true" />
<filter class="solr.GermanNormalizationFilterFactory"/>
<filter class="solr.EnglishPossessiveFilterFactory"/>
<filter class="solr.EnglishMinimalStemFilterFactory"/>
<filter class="solr.StopFilterFactory" ignoreCase="true" words="lang/stopwords_en.txt" enablePositionIncrements="true" />
<filter class="solr.EdgeNGramFilterFactory" minGramSize="2" maxGramSize="25"/>
<filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
</analyzer>
<analyzer type="query">
<tokenizer class="solr.WhitespaceTokenizerFactory"/>
<filter class="solr.StopFilterFactory" ignoreCase="true" words="lang/stopwords_de.txt" enablePositionIncrements="true" />
<filter class="solr.GermanNormalizationFilterFactory"/>
<filter class="solr.EnglishPossessiveFilterFactory"/>
<filter class="solr.EnglishMinimalStemFilterFactory"/>
<filter class="solr.StopFilterFactory" ignoreCase="true" words="lang/stopwords_en.txt" enablePositionIncrements="true" />
<filter class="solr.LowerCaseFilterFactory"/>
<filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
</analyzer>
</fieldType>
And then, for copy the field:
<copyField source="shortdesc" dest="field_autocomplete"/>
Ok, then, my fist question:
- When indexing, all the content of the field text_autocomplete, comes from the copy of shortdesc, does that mean than a value on the field shortdesc is processed and then copy to field_autocomplete ? In that case, I don't need to apply the the filters on the type text_autocomplete because they are the same than in text_de and the source is gonna come with the filters already applied ? Is this right or I have to specify the filters for all of them (for each field I want "to capture" ?
And another question:
- When I use the analyser, if I introduce a word that belong to the stopword, on the field text_de, the filter is applied and the word did't appear: But when I do the same on the field text_autocomplete , seems the word is there and stored as term, the filter didn't do nothing...
Can anybody give me a clue about this two things that are getting crazy ?