Which is the best choice to indexing a Boolean val

2019-06-15 01:28发布

Indexing a Boolean value(true/false) in lucene(not need to store) I want to get more disk space usage and higher search performance

doc.add(new Field("boolean","true",Field.Store.NO,Field.Index.NOT_ANALYZED_NO_NORMS));
//or
doc.add(new Field("boolean","1",Field.Store.NO,Field.Index.NOT_ANALYZED_NO_NORMS));
//or
doc.add(new NumericField("boolean",Integer.MAX_VALUE,Field.Store.NO,true).setIntValue(1));

Which should I choose? Or any other better way?

thanks a lot

标签: java lucene
3条回答
我命由我不由天
2楼-- · 2019-06-15 01:35

Use Solr (a flavour of lucene) - it indexes all basic java types natively.

I've used it and it rocks.

查看更多
祖国的老花朵
3楼-- · 2019-06-15 01:42

An interesting question!

  • I don't think the third option (NumericField) is a good choice for a boolean field. I can't think of any use case for this.
  • The Lucene search index (leaving to one side stored data, which you aren't using anyway) is stored as an inverted index
  • Leaving your first and second options as (theoretically) identical

If I was faced with this, I think I would choose option one ("true" and "false" terms), if it influences the final decision.

Your choice of NOT_ANALYZED_NO_NORMS looks good, I think.

查看更多
乱世女痞
4楼-- · 2019-06-15 01:52

Lucene jumps through an elaborate set of hoops to make NumericField searchable by NumericRangeQuery, so definitely avoid it an all cases where your values don't represent quantities. For example, even if you index an integer, but only as a unique ID, you would still want to use a plain String field. Using "true"/"false" is the most natural way to index a boolean, while using "1"/"0" gives just a slight advantage by avoiding the possibility of case mismatch or typo. I'd say this advantage is not worth much and go for true/false.

查看更多
登录 后发表回答