Design : Relation Vs. Node

2019-08-01 15:34发布

问题:

I am new to graph database, and overwhelmed with its promised scope and power. When designing app, it is very important that we align our emotional design patterns with practical performance designs. One of the question that is bothering me is whether to make certain piece of information as relation attribute or node attribute. Here's the use case. We have entities that are related by incoming relation "service_provider". The starting node becomes a provider to end node. Now each of the service provider are related to their customer(or consumer) via some contract. Like frequency of service, cost per service, no show charges etc. These contracts are details that can vary from service to service, but the attributes will always be there.

So now my question is should these contract be a different node, that can be connected to a pair of entities (service provider and consumer) or they should be part of relation itself.

Please note that , my emotional feeling is towards making it part of relationship and that's why my description might have painted a picture like that. However, that is not necessarily be the case. I want to hear your views so that I can conclude my approach.

If you think the question is at wrong place please consider suggesting a better location.

I have already referred to docs Boosting recommendation results @ docs.neo4j.org - all docs indicates that, what I mentioned is a possible solution. But here are the concerns - The examples are lite on relation attributes - They don't really measure the PROS an CONS of either approach

Multiple relationships of the same ... @ Stackoverflow - Not really the same question , however, it is relevant to the usecase.

Referring response from @bendaizer Now here is a performance question. Comparing #1 and #4(partly). Assuming contracts are defined by service provider (at least in most of the cases), the only way a service provider can connect to a consumer is via contract. so we have a service provider surrounded by contracts and contracts connect to consumer. When I try to lookup a consumer by service provider...I have to do the extra hop of contract. While per #1, the same contract information can be put in relationship property. Which one is expected to perform better, assuming all contracts are unique? While do so I don't want to lose the capability of answering questions like [Which all customers of "Service Provider X", are paying a rate $50 an hour - $50/hr is part of contract info]

Also, checkout the same question at Google Forum

回答1:

There are no general rule of thumb to decide how to implement a graph. You have different options, and I think you must stick to the one you find the easiest without but which also gives you a good performance.

I actually see 4 options in your case, and sorted them in my preference order :

  • you define a service_provider type relationship, and you add a key/value property {contract : type of contract}. You can then index the relationships by their properties, so you can retrieve back the contracts and the corresponding start and end nodes from the index. Very simple, does the job.

  • you can define indexing nodes of the contract types, and each time you add new provider to some node, you also link that node to the type of the contract. This of course imply that the provider is unique, or else you won't be able to distinguish which contract is linked to which provider for a client node. I honestly don't think this this the best solution, unless you want to use your database for explicit pattern identification and matching over the type of contract (I would recommand for advanced graph mining only if you're planning to do so).

  • you can define a node for each contract (instead of one node per contract type), you will have a lot more of nodes. But this might be useful if you need to have "egde over edge" type of relationships. This can be the case for individual classification purpose. This is useful if your nodes can have different providers and different type of contract, and the contract can be attached to specific features on an individual basis.

  • you can define two relationships between the nodes, one with the type service_provider, one with the type of the contract. I honeslty think the 1st approach is better if it's just for storing the information. But this one can also be useful for future pattern matching, even though in this case I recommand the second one.

As you can see, it depends a lot of what you're planning to do with your graph. Hope this helps !