We're currently in the process of implementing a CRM-like solution internally for a professional firm. Due to the nature of the information stored, and the varying values and keys for the information we decided to use a document storage database, as it suited the purposes perfectly (In this case we chose MongoDB).
As part of this CRM solution we wish to store relationships and associations between entities, examples include storing conflicts of interest information, shareholders, trustees etc. Linking all these entities together in the most effective way we determined a central model of "relationship" was necessary. All relationships should have history information attached to them ( commencement and termination dates), as well as varying meta data; for example a shareholder relationship would also contain number of shares held.
As traditional RDBMS solutions didn't suit our former needs, using them in our current situation is not viable. What I'm trying to determine is whether using a graph database is more pertinent in our case, or if in fact just using mongo's built-in relational information is appropriate.
The relationship information is going to be used quite heavily throughout the system. An example of some of the informational queries we wish to perform are:
- Get all 'key contact' people of companies who are 'clients' of 'xyz limited'
- Get all other 'shareholders' of companies where 'john' is a shareholder
- Get all 'Key contact' people of entities who are 'clients' of 'abc limited' and are clients of 'trust us bank limited'
Given this "tree" structure of relationships, is using a graph database (such as Neo4j) more appropriate?
Mike,
you should be able to store your relationship data in the graph database. Its high performance on traversing big graphs comes from locality, i.e. you don't run queries globally but rather start a a set of nodes (which equal documents in your case, which are looked up by an index. you might even store start-node-ids for quick access in your mongo documents). From there you can traverse arbitrarily large paths in constant time (wrt data set size).
What are your other requirements (i.e. data set size, # of concurrent accesses etc, relationship/graph complexity).
Your queries are a really good fit for the graph database and easily expressable in its terms.
I'd suggest that you just grab a graphdb like neo4j and do a quick spike with your domain to verify the general feasibility and also find out additional questions you would like to have answered before investing in the second technology.
P.S. If you hadn't started yet, you could also have gone with a pure graphdb approach as graph databases are a superset of document databases. And you'd rather talk domain in your case anyway than just generic documents. (E.g. structr is a CMS built on top of Neo4j).
The documents in MongoDB very much resemble nodes in Neo4j, minus the relationships. They both hold key-value properties. If you've already made the choice to go with MongoDB, then you can use Neo4j to store the relationships and then bridge the stores in your application. If you're choosing new technology, you can go with Neo4j for everything, as the nodes can hold property data just as well as documents can.
As for the relationship part, Neo4j is a great fit. You have a graph, not unrelated documents. Using a graph database makes perfect sense here, and the sample queries have graph written all over them.
Honestly though, the best way to find out what works for you is to do a PoC - low cost, high value.
Disclaimer: I work for Neo Technology.
stay with mongodb. Two reasons - 1. its better to stay in the same domain if you can to reduce complexity and 2. mongodb is excellent for querying and requires less work than redis, for example.
We ended up using both, we are implementing a search engine for a transportation network.
Trying to implement relationships in MongoDB can become unwieldy once you go beyond 1 or 2 "links". Essentially you would be storing objectids in an array and if you want to implement bi-directional relationships, then you have to implement two separate links. In Mongo, a "pointer" to an entity (or "link") is just another text property (that can be interpreted differently), it is not a first class object like a relationship in Neo4j.
So we decided to use Neo4j to store the relationships and MongoDB to store everything else. The challenge then became keeping the two stores in sync.
We are using a 10gen lab project called "MongoConnector" which is mechanism to keep MongoDB in sync with another store. The project is currently unsupported, but the code is available:
http://blog.mongodb.org/post/29127828146/introducing-mongo-connector
MongoConnector uses the replica mechanism to implement the syncing. Essentially you are monitoring the MongoDB OpLog and you are implementing callbacks for any upserts (update or insert) and deletes. This implementation is called a "DocumentManager" in MongoConnector speak. We ended implementing a Neo4jDocumentManager.
On the query side, we found that Neo is better suited for "friend of a friend" kind of query, whereas MongoDB was better for general purpose queries, ie. per field or range queries dealing with dates.
I've been planning to have a talk and a blog post, but I haven't got to it yet:
http://www.meetup.com/graphdb-boston/events/91703472/
There are drawbacks to this solution, like things getting out of sync if a process goes down or syncing being slow (not in realtime).