-->

how to implement wildcard search with sunspot

2019-07-06 00:16发布

问题:

any help is always welcome I am using sunspot with solr but not able to find any good solution that how to perform wildcard search with sunspot

if i search for 8088***

it should return all numbers starts with 8088 but not 228088560

回答1:

Look for the following lines of code in /solr/conf/schema.xml:

<fieldType name="text" class="solr.TextField" omitNorms="false">
    ...
</fieldType>

and replace them with this:

<fieldType name="text" class="solr.TextField" omitNorms="false">
    <analyzer type="index">
        <tokenizer class="solr.StandardTokenizerFactory"/>
        <filter class="solr.StandardFilterFactory"/>
        <filter class="solr.LowerCaseFilterFactory"/>
        <filter class="solr.EdgeNGramFilterFactory" minGramSize="1" maxGramSize="20" side="front" />
    </analyzer>
    <analyzer type="query">
        <tokenizer class="solr.StandardTokenizerFactory"/>
        <filter class="solr.StandardFilterFactory"/>
        <filter class="solr.LowerCaseFilterFactory"/>
    </analyzer>
</fieldType>

Remember to restart the solr server, and reindex after these changes

rake sunspot:solr:stop
rake sunspot:solr:start
rake sunspot:reindex


回答2:

Sunspot gives you wildcard for free* with NGramToeknizer(there are sometimes NGramTokenizer issues for subsets that are too small and other quirks), which means that exclusion is actually the tricky part. If you know the number of digits in the number (say 6), a crude, but effective, way to handle this would be to use without (:field).greater_than(808900) without (:field).less_than(808700) <-- I don't remember whether .greater_than and .less_than are actually => and =< , so if they are just > and < you may want to do 808899 and 808800 instead, but you get the idea.

**Correction There is a solution for this: you can change the NGramFilterFactory in your solr/config/schema.xml to an EdgeNGramFilterFactory (assuming you had an NGramFilterFactory in the first place to get the partial-word seaching). This makes the index only break up words starting at the beginning of strings. After this, restart your server and reindex.

***All credit to Zach Moazeni at Collective Idea for this