Neo4j how to avoid supernodes

2020-04-09 22:03发布

问题:

In my Neo4j project I have Role and Permission entities which represent user roles and permissions. Each User in the system has relationships to appropriate sets of roles and permissions.

I think Role and Permission are some kind of supernodes that can become a major headache from a performance point of view in future.

What is the best practice for this case ? How to reimplement Role and Permission in order to avoid possible issues with supernodes ?

回答1:

Do you plan to make some aggregate/mass queries based on Roles (i.e. count number of people of certain role, list them)?

If not, and you just want to check if a specific user has certain Role, than in my humble opinion it should not cause difficult to maintain, important performance issues ( as you will traverse certain relationships of the graph, ignoring vast majority of multiple relations of your "supernodes" ). I would keep with simple design ( "premature optimization is the root of all evil" ;) ), and once problems are noticed (internally, relationships are stored in a linkedlist-like structure, so finding a proper one may take time on supernode, even if you restrict searching to a certain relation type), splitting Role nodes using meta-node approach should do the job (it's described in Learning Neo4j)

If yes, you have a problem. That's probably a field in which RDBMS are better... Using meta nodes probably won't help, as you will still to have process all of them to list/count all users... So caching that data in a separate store may be simply the best idea ...



回答2:

I'm going to assume that you're just using Neo4j as a permissions lookup data source (like hasPermission(current_user, 'permission_string')) and not tied into any queries to other entities. That can be fine, especially if you have a hierarchical access schema. If that's not true then this might not apply and it would be good to have a clearer idea of what your entities look like.

Since you're likely using permissions throughout your application it might and if they're going to grow in size and scope it could make sense for performance to use some form of caching like an in-memory store or in Redis, for example.

It might even make sense to generate a denormalized cache of every permission state for every user. So you would evaluate your rules which might be based on hierarchical roles/permissions and come out with a list of "User X has permission Y". Then whenever you change a user or a permission you'd regenerate the cache for that entity, and if you changed a role you would regenerate the cache for all of the associated users and permissions.

Also I don't know if I would apply this advice to just Neo4j. If you're talking about a simple key/value lookup then a lot of general purpose databases would be overkill in performance critical situations.