In Solr, I've got text that contains $30 and 30.
I would like to search for $30 and only find documents containing $30.
But if someone searches for 30, they should find both documents containing $30 and those containing 30.
Here is the field type I'm currently using to index my text field:
<!-- Just like text_en_splitting, but with the addition of reversed tokens for leading wildcard matches -->
<fieldType name="text_en_splitting_reversed" class="solr.TextField" positionIncrementGap="100" autoGeneratePhraseQueries="true">
<analyzer type="index">
<tokenizer class="solr.WhitespaceTokenizerFactory"/>
<!-- in this example, we will only use synonyms at query time
<filter class="solr.SynonymFilterFactory" synonyms="index_synonyms.txt" ignoreCase="true" expand="false"/>
-->
<!-- Case insensitive stop word removal.
add enablePositionIncrements=true in both the index and query
analyzers to leave a 'gap' for more accurate phrase queries.
-->
<filter class="solr.StopFilterFactory"
ignoreCase="true"
words="lang/stopwords_en.txt"
enablePositionIncrements="true"
/>
<filter class="solr.WordDelimiterFilterFactory" generateWordParts="1" generateNumberParts="1" catenateWords="1" catenateNumbers="1" catenateAll="0" splitOnCaseChange="1" types="word-delim-types.txt" />
<filter class="solr.LowerCaseFilterFactory"/>
<filter class="solr.KeywordMarkerFilterFactory" protected="protwords.txt"/>
<filter class="solr.PorterStemFilterFactory"/>
<filter class="solr.ReversedWildcardFilterFactory" withOriginal="true"
maxPosAsterisk="3" maxPosQuestion="2" maxFractionAsterisk="0.33"/>
</analyzer>
<analyzer type="query">
<tokenizer class="solr.WhitespaceTokenizerFactory"/>
<filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt" ignoreCase="true" expand="true"/>
<filter class="solr.StopFilterFactory"
ignoreCase="true"
words="lang/stopwords_en.txt"
enablePositionIncrements="true"
/>
<filter class="solr.WordDelimiterFilterFactory" generateWordParts="1" generateNumberParts="1" catenateWords="0" catenateNumbers="0" catenateAll="0" splitOnCaseChange="1" types="word-delim-types.txt" />
<filter class="solr.LowerCaseFilterFactory"/>
<filter class="solr.KeywordMarkerFilterFactory" protected="protwords.txt"/>
<filter class="solr.PorterStemFilterFactory"/>
</analyzer>
</fieldType>
I have defined word-delim-types.txt to contain:
$ => DIGIT
% => DIGIT
. => DIGIT
So when I search for $30, it correctly locates documents containing "$30" but not those containing just "30". That's good. But when I search for "30" it does not find documents containing "$30", only those containing "30".
Is there some way to do this?
I have found the solution to my question. Instead of defining $ % and . as DIGIT, I now define them as ALPHA, in my "types" file that is passed in as an attribute to the WordDelimiterFilterFactory.
Due to the rest of my WordDelimiterFilterFactory settings, things are broken up and catenated in a way where the desired effect is achieved:
Searching for $30 yields only documents containing $30. Searching for 30 yields documents containing both $30 and 30.