I am new to Solr.I want to know when to use StandardTokenizerFactory and KeywordTokenizerFactory?
I read the docs on Apache Wiki, but I am not getting it.
Can anybody explain the difference between StandardTokenizerFactory and KeywordTokenizerFactory?
StandardTokenizerFactory :-
It tokenizes on whitespace, as well as strips characters
Documentation :-
Would use this for fields where you want to search on the field data.
e.g. -
would generate 7 tokens (separated by comma) -
KeywordTokenizerFactory :-
Keyword Tokenizer does not split the input at all.
No processing in performed on the string, and the whole string is treated as a single entity.
This doesn't actually do any tokenization. It returns the original text as one term.
Mainly used for sorting or faceting requirements, where you want to match the exact facet when filtering on multiple words and sorting as sorting does not work on tokenized fields.
e.g.
would generate a single token -