If I have a field x, that can contain a value of y, or z etc, is there a way I can query so that I can return only the values that have been indexed?
Example x available settable values = test1, test2, test3, test4
Item 1 : Field x = test1
Item 2 : Field x = test2
Item 3 : Field x = test4
Item 4 : Field x = test1
Performing required query would return a list of: test1, test2, test4
I once used Lucene 2.9.2 and there I used the approach with the FieldCache as described in the book "Lucene in Action" by Manning:
String[] fieldValues = FieldCache.DEFAULT.getStrings(indexReader, fieldname);
The array
fieldValues
contains all values in the index for fieldfieldname
(Example:["NY", "NY", "NY", "SF"]
), so it is up to you now how to process the array. Usually you create aHashMap<String,Integer>
that sums up the occurrences of each possible value, in this case NY=3, SF=1.Maybe this helps. It is quite slow and memory consuming for very large indexes (1.000.000 documents in index) but it works.
I think a WildcardQuery searching on field 'x' and value of '*' would do the trick.
I've implemented this before as an extension method:
You can use it very easily like this:
That will return you what you want.
EDIT: I was skipping the first term above, due to some absent-mindedness. I've updated the code accordingly to work properly.
You can use facets to return the first N values of a field if the field is indexed as a string or is indexed using KeywordTokenizer and no filters. This means that the field is not tokenized but just saved as it is.
Just set the following properties on a query: