I'm currently trying to implement a Lucene.NET based search on a large database and I've hit a snag trying to do a search on what is essentially relational data.
At a high level the data I'm trying to search is grouped, each item belongs to 1 to 3 groups. I then need to be able to do a search for all items that are in a combination of groups (EG: Each item belongs to both group A and group B).
Each of these groupings have ID's and Descriptions existing from the data I'm searching, but the descriptions may be sub-strings of one another (EG: One group named "Stuff" and the other "Other stuff"), and I don't want to match the categories that have a sub-string of the one I'm looking for.
I've been considering pulling the data back without this filtering and then filtering the ID's, but I was intending to paginate the data returned from Lucene for performance reasons. I've also considered putting the ID's in space-separated and doing a text-search on the field, but that seems like a total hack...
Does anyone have any idea how to best handle this kind of search in Lucene.NET? (Just to clarify before someone says I'm using the wrong tool, this is only a subset of a larger set of filters which includes full-text searching. If you still think I'm using the wrong tool though I'd love to hear what the right one is)
I've had my share of problems with storing relational data i Lucene but the one you have should be easy to fix.
I guess you tokenize the group fields and that makes it possible to search for substrings in the field value. Just add the field untokenized and it should work like expected.
Please check the following small piece of code:
The sample adds two documents to the index which are untokenized, does a search for stuff and gets one hit. If you changed the code to add them tokenized then you will have two hits as you see now.
The issue with using Lucene for relational data is that it might be expected that wildcard and range searches always will work. That is not really the case if the index is big due to way Lucene resolves those queries.
Another sample to illustrate the behavior: