Solr - KeywordTokenizerFactory - Exact Match for M

2019-06-13 16:59发布

问题:

I have made the following type definition in Solr:

<fieldType name="text_phrase" class="solr.TextField" positionIncrementGap="100">
  <analyzer type="index">
    <tokenizer class="solr.KeywordTokenizerFactory"/>
  </analyzer>
  <analyzer type="query">
    <tokenizer class="solr.WhitespaceTokenizerFactory"/>
  </analyzer>    
</fieldType>

It should index values verbatim (no tokenization).

I add the value "skinny jeans" to my index.

When I run the following search query (url decoded for reading) I get no results:

http://myvm:8983/solr/mycore/select?q=*:*&fq=name:("skinny jeans")&wt=json&indent=true&debugQuery=true

You can see the URL is searching for everything (*:*) with a filter query for the exact value "skinny jeans".

I then add the value "jeans" to my index, and run a similar query with

&fq=name:("jeans")

And I do find the "jeans" element.


So it works for a single word, but not for multiple words. Why would this be? I'm searching for an exact value after all. It makes me suspect that the KeywordTokenizerFactory is doing something odd. Can anyone please advise why no results are being returned from such a basic setup?

Thanks,

回答1:

This is because you are using the KeywordTokenizerFactory for indexing which keeps the word as it is. Does not apply any tokenization or does not create any tokens. But While querying you are using WhitespaceTokenizerFactory which creates tokens for the whitespace.

So KeywordTokenizerFactory will have a token like "skinny jeans" as single token in the index.

WhitespaceTokenizerFactory will create tokens like "skinny", "jeans".

You can see the difference, it wont match. You are searching for "skinny", "jeans" against "skinny jeans".

You need to either change the index tokenizer or the query tokenizer.

If you want to go ahead for the exact match then keep the KeywordTokenizerFactory for both as in tokenizer while indexing and querying as shown below

<fieldType name="text_phrase" class="solr.TextField" positionIncrementGap="100">
  <analyzer type="index">
    <tokenizer class="solr.KeywordTokenizerFactory"/>
  </analyzer>
  <analyzer type="query">
    <tokenizer class="solr.KeywordTokenizerFactory"/>
  </analyzer>    
</fieldType>

You can check the token created while indexing and token created while querying using solr analysis tool.