I am trying to allow searches on partial strings in Solr so if someone searched for "ppopota" they'd get the same result as if they searched for "hippopotamus." I read the documentation up and down and feel like I have exhausted my options. So far I have the following:
Defining a new field type:
<fieldtype name="testedgengrams" class="solr.TextField">
<analyzer>
<tokenizer class="solr.LowerCaseTokenizerFactory"/>
<filter class="solr.EdgeNGramFilterFactory" minGramSize="2" maxGramSize="15" side="front"/>
</analyzer>
</fieldtype>
Defining a field of type "testedgengrams":
<field name="text_ngrams" type="testedgengrams" indexed="true" stored="false"/>
Copying contents of text_ngrams into text:
<copyField source="text_ngrams" dest="text"/>
Alas, that doesn't work. What am I missing?
You're using EdgeNGramFilterFactory which generates tokens 'hi', 'hip', 'hipp', etc, so it won't match 'ppopota'. Use NGramFilterFactory instead.
Ok I'm doing the same thing with field name
And I managed to get this thing to work using copyField like this:
schema.xml
Then create search condition in solrconfig.xml
With this solr is searching in fields name_de_partial with pow 1.0 and in name_de with pow 3.0
So if engine founds specific query word in name_de, then it is put on top of the list. If he also finds something in name_de_partial then it also counts and is put in results.
And field name_de_partial is using specific solr filters so it can found word "hippie" using query "hip" or "ppie" or "ippi" without a swet.
To enable partial word searching
you must edit your local schema.xml file, usually under solr/config, to add either:
Here's what mine looks like: sample solr schema.xml
Here's the line to paste:
EdgeNGram
I went with the EdgeN option. It doesn't allow for searching in the middle of words, but it does allow partial word search starting from the beginning of the word. This cuts way down on false positives / matches you don't want, performs better, and is usually not missed by the users. Also, I like the minGramSize=2 so you must enter a minimum of 2 characters. Some folks set this to 3.
Once your local is setup and working, you must edit the schema.xml used by websolr, otherwise you will get the default behavior which requires the full-word to be entered even if you have full text searching configured for your models.
Take it to the next level
5 ways to speed up indexing
Special instructions for editing the websolr schema.xml if you are using Heroku
If you set EdgeNGramFilterFactory or NGramFilterFactory both at index and query time, combined with q.op=AND (or default mm=100% if you are using dismax) you will experience some problems.
Try defining NGramFilterFactory only at index time:
or try setting q.op=OR (or mm=1 if you are using dismax)