I am trying to implement an index of documents (rougly corresponding to DB rows), where one of the fields is an integer. I'm adding them to index like:
Document doc = new Document();
doc.add(new StringField("ticket_number", rs.getString("ticket_number"),
Field.Store.YES));
doc.add(new IntField("ticket_id", rs.getInt("ticket_id"),
Field.Store.YES));
doc.add(new StringField("id_s", rs.getString("ticket_id"),
Field.Store.YES));
w.addDocument(doc);
It seems I can't query the ticket_id
field at all, while id_s
works just fine.
One of the documents is (I added whitespace for readability):
Document<
stored,indexed,tokenized,omitNorms,indexOptions=DOCS_ONLY<ticket_number:230114W>
stored<ticket_id:152>
stored,indexed,tokenized,omitNorms,indexOptions=DOCS_ONLY<id_s:152>>
So my int field is stored, but not indexed. This query works as expected: id_s:152
, while this one never returns anything: ticket_id:152
.
What am I doing wrong? How can I add such a field to the index and make it searchable?
Numeric Fields can be queried with a NumericRangeQuery. For an exact match, simply set the max and min to equal values.
Your output indicating the field is not indexed could be due to the differences in how a numeric value is indexed, compared to a text value. Considering that the field is transformed into Lucene's numeric representation, the literal value
152
will indeed not be indexedAt a glance, however, it's possible that your handling of id_s may be the better alternative. IDs are not usually handled as numeric values, but rather as just simple identifiers that happen to be represented with digits. If you don't need numeric sorting or range querying on the field, indexing as a
StringField
certainly makes more sense.Another answer comes from this thread (third answer): Lucene 4.0 IndexWriter updateDocument for Numeric Term
Basically, you create a Term with your int value like this:
Then you can use this term for searching, or deleting/updating your index. In a first test, this worked fine for me. I can't tell if this is the "right" way to do things however. I've used the NumericRangeFilter before for filtering IntFields, but now I'm inclined to use this approach and use regular TermsFilter, or TermQueries instead.
Below works for me:
As femtoRgon pointed out, for numeric values (longs, dates, floats, etc.) you need to have
NumericRangeQuery
and specify precision. Otherwise Lucene has no idea how do you want to define similarity.