I have made the following type definition in Solr:
<fieldType name="text_phrase" class="solr.TextField" positionIncrementGap="100">
<analyzer type="index">
<tokenizer class="solr.KeywordTokenizerFactory"/>
</analyzer>
<analyzer type="query">
<tokenizer class="solr.WhitespaceTokenizerFactory"/>
</analyzer>
</fieldType>
It should index values verbatim (no tokenization).
I add the value "skinny jeans" to my index.
When I run the following search query (url decoded for reading) I get no results:
http://myvm:8983/solr/mycore/select?q=*:*&fq=name:("skinny jeans")&wt=json&indent=true&debugQuery=true
You can see the URL is searching for everything (*:*) with a filter query for the exact value "skinny jeans".
I then add the value "jeans" to my index, and run a similar query with
&fq=name:("jeans")
And I do find the "jeans" element.
So it works for a single word, but not for multiple words. Why would this be? I'm searching for an exact value after all. It makes me suspect that the KeywordTokenizerFactory is doing something odd. Can anyone please advise why no results are being returned from such a basic setup?
Thanks,
This is because you are using the
KeywordTokenizerFactory
for indexing which keeps the word as it is. Does not apply any tokenization or does not create any tokens. But While querying you are usingWhitespaceTokenizerFactory
which creates tokens for the whitespace.So
KeywordTokenizerFactory
will have a token like"skinny jeans"
as single token in the index.WhitespaceTokenizerFactory
will create tokens like"skinny", "jeans"
.You can see the difference, it wont match. You are searching for
"skinny", "jeans"
against"skinny jeans"
.You need to either change the index tokenizer or the query tokenizer.
If you want to go ahead for the exact match then keep the
KeywordTokenizerFactory
for both as in tokenizer while indexing and querying as shown belowYou can check the token created while indexing and token created while querying using solr analysis tool.