could someone point me to a working use of a graph database for genealogy? I would like to learn neo4j and I use python so I was looking to make a genealogy graph db for myself to learn graph db's. I searched for examples to emulate and learn from (any graph db, any language) but was surprised how little I found.
Note I mean graph db which has a different structure than a relational db. See http://en.m.wikipedia.org/wiki/Graph_database.
I'm looking for an example schema for genealogy.
If you want to learn to do graphdb, you don't need to use any software. Pencil, paper and brain will do. The things you need to have in mind to come up with a design are:
- What is a graph: vertex & edges
- What is specific about the graphdb datastructure: vertex & edges are associated with a python-like dict.
- What informations needs to be in the graphdb to solve the problem I have at hand. List all the queries you want to be able to do against the graph.
In the diagram below, you will see a graph that can be the basis of your design.
You have to imagine that every node has a name, date of birth, etc... and a unique identifier.
It represents two disconnected families, at left one with two children, at right one with three children.
With the above graph you could compute:
- Who is the parent of X?
- What is name of the father part of the biggest family?
And others, since there is only two family with only parent & child, no grand parents or grand childs represented you might not be able to understand that actually you could also compute the following query:
- Who are the people that have X as ancestor that are still alive?
Now if you want to go and experiment with Python you have several choices starting with the easier setup:
Pure python:
- Create a Vertex class and Edge class that inherits dict.
- Build an genealogy graph with Python code from real data or else.
- Experiment with queries.
Python and BerkleyDB
- disclaimer: this is a project of mine
- Same as the pure Python version, except the graph is saved in a database. The API is similar the neo4j python bindings.
They are other solutions, but without more context about the target application (e.g. web or desktop) I can not list them all. They are some informations on neo4j website which can be helpful.
That said, the best solution might involve neo4j, but Rexster for a networked application or Blueprints for others are required if you want to easly switch between several database to find the best database in terms of performance for your usecase. The only reason to use directly a neo4j server is to be able to use cypher query language.
If I had to create a genealogy webapp and build a business out of it I would use softwares that I've built, namely:
- Java-GraphitiDB
- Graphiti ORM
Those are not ready for production as-is. But that's what I would do.
If you want to use a fast database without a server (and without a JVM). I suggest you try the brand new Sparksee (formely Dex) python binding. The raw API is however not portable. The performance however is order of magnitudes faster.
The second option is to use Bulbs which runs on top of Neo4j via its REST api, it also support any Rexsters server. The query language is Gremlin (Cypher works too). The good point is that you can switch for a different backend if it better suits your needs.
Concerning your DB scheme you have at least 1 node and 1 edge:
1 node: PERSON (name, birth, death) which are indexed fields.
1 directed restricted edge from PERSON to PERSON named: CHILD_OF or PARENT_OF.
You can add more edges between nodes such as: SIBLINGS, MARRIED_TO, etc.