First some background to my question.
- Individual entities can have read Permissions.
- If a user fails a read permission check they cant see that instance.
The probelm relates to introducing Lucene and performing a search which simply returns a list of matching entity instances. My code would then need to filter entities one by one. This approach is extremely inefficient as the situation exists that a user may only be able to see a small minority and checking many to return a few is less than ideal.
What approaches or how would developers solve this problem - keeping in mind that indexing and searches are performed using Lucene ?
EDIT
Definitions
- A User may belong to many Groups.
- A Role may have many Groups - these can change.
- A Permission has a Role - (indirection).
- X can have a read Permission.
- It is possible for the definition of a Role to change at any time.
Indexing
- Adding the set of Groups (expanding a Permmission) at index time may result in the definition becoming out of sync when the list of member groups for a Role change.
- I am hoping to avoid having to reindex X whenever the definition of a Permission/Role changes.
Security Check
- To pass a Permission check a User must belong to a group that is within the set of groups belong to the Role for a given Permission.
It depends on the number of different security groups that are relevant in your context and how the security applies to your indexed data.
We had a similar issue which we solved the following way: When indexing we added the allowed groups to the document and when searching we added a boolean query with the groups the user was a member of. That performed well in our scenario.
It depends on your security model. If permissions are simple - say you have three classes of documents - It is probably best to build a separate Lucene index per class, and merge the results when a user can see more than one class.
The Solr security Wiki suggests something similar to HakonB's suggestion - adding user's credentials to the query and searching by them.
See also this discussion in the Lucene user group.
Another strategy will be to wrap the Lucene search with a separate security class that does additional filtering out of Lucene. It may be faster if you can do this using a database for the permissions.
Edit:
I see you have a rather complex permission system. Your basic design choice is whether to implement it inside Lucene or outside Lucene. My advice is to use Lucene as a search engine (its primary strength) and use another system/application for security. If you choose to use Lucene for security anyway, I suggest you learn Lucene Filters well, and use a bitset filter in order to filter a query's results. It does have the problems you listed of having to keep the permissions updated.
As Yuval mentioned, it might be worth having the permission mechanism independent of the lucene index.
One way to do it is to implement your own Collector
, that will filter out the results that the user should not have access to.
What I would suggest is having two kind of documents:
1) Real_documents with a field called: "DocumentID"
2) A security document with fields: "Role" "Groups" "Users" "PermisionId" "DocumentsIds"
then a pseudo-code could be:
Field[] docIds =searcher.search("Users", "currentUser").getFields("DocumentIds");
TermsFilter filter = new TermFilter();
foreach(field:docIDs){
filter.add(new Term(field.field(),field.text());
}
searcher.search(query.getWeight(searcher), filter, numberOfDocuments);
Being that Lucene is very fast on searching two searches are really easy to make. In this way you also have a better tf-idf per user.