Difference between StandardTokenizerFactory and Ke

2019-01-27 14:26发布

I am new to Solr.I want to know when to use StandardTokenizerFactory and KeywordTokenizerFactory?

I read the docs on Apache Wiki, but I am not getting it.

Can anybody explain the difference between StandardTokenizerFactory and KeywordTokenizerFactory?

标签： java solr solrnet tokenize

1条回答

贼婆χ

2楼-- · 2019-01-27 15:03

StandardTokenizerFactory :-
It tokenizes on whitespace, as well as strips characters

Documentation :-

Splits words at punctuation characters, removing punctuations. However, a dot that's not followed by whitespace is considered part of a token. Splits words at hyphens, unless there's a number in the token. In that case, the whole token is interpreted as a product number and is not split. Recognizes email addresses and Internet hostnames as one token.

Would use this for fields where you want to search on the field data.

e.g. -

http://example.com/I-am+example?Text=-Hello

would generate 7 tokens (separated by comma) -

http,example.com,I,am,example,Text,Hello

KeywordTokenizerFactory :-

Keyword Tokenizer does not split the input at all.
No processing in performed on the string, and the whole string is treated as a single entity.
This doesn't actually do any tokenization. It returns the original text as one term.

Mainly used for sorting or faceting requirements, where you want to match the exact facet when filtering on multiple words and sorting as sorting does not work on tokenized fields.

e.g.

http://example.com/I-am+example?Text=-Hello

would generate a single token -

http://example.com/I-am+example?Text=-Hello

0人赞添加讨论(0) 举报

Difference between StandardTokenizerFactory and Ke

采纳回答

编辑标签

举报内容

检举类型

检举原因

检举说明(必填)

打开微信“扫一扫”，打开网页后点击屏幕右上角分享按钮

付费偷看金额在0.1-10元之间