I am currently using embedded python binding for neo4j. I do not have any issues currently since my graph is very small (sparse and upto 100 nodes). The algorithm I am developing involves quite a lot of traversals on the graph, more specifically DFS on the graph in general as well as on different subgraphs. In the future I intend to run the algorithm on large graphs (supposedly sparse and with millions of nodes).
Having read different threads related to the performance of python/neo4j bindings here, here, I wonder whether I should already switch to some REST API client for Python (like bulbflow, py2neo, neo4jrestclient) until I am too far to change all code.
Unfortunately, I did not find any comprehensive source of information to compare different approaches.
Could anyone provide some further insight into this issue? Which criteria should I take into account when choosing one of the options?
The easiest way to run algorithms from Python is to use Gremlin (https://github.com/tinkerpop/gremlin/wiki).
With Gremlin you can bundle everything into one HTTP request to reduce round-trip overhead.
Here's how to execute Gremlin scripts from Bulbs (http://bulbflow.com):
The Bulbs Gremlin API docs are here: http://bulbflow.com/docs/api/bulbs/gremlin/
Not really sure, I am not an expert, but I think it also depends on your Django expectations, and how much of a framework you need. Py2neo is very pragmatic and slim, Bulbflow seems to build up a whole mapping stack etc, and neo4jrestclient is concentrating on Django (that may be wrong)?
Django is an MVC web framework so you may be interested in that if yours is to be a web application.
From the point of view of py2neo (of which I am the author), I am trying to focus hard on performance by using the batch execution mechanism automatically where appropriate as well as providing strong Cypher support. I have also recently put a lot of work into providing good options for uniqueness management within indexes - specifically, the
get_or_create
andadd_if_none
methods.