I am building a data-intensive Python application based on neo4j and for performance reasons I need to create/recover several nodes and relations during each transaction. Is there an equivalent of SQLAlchemy session.commit()
statement in bulbs?
Edit:
for those interested, an interface to the Bulbs have been developped that implements that function natively and otherwise functions pretty much just like SQLAlchemy: https://github.com/chefjerome/graphalchemy
The most performant way to execute a multi-part transaction is to encapsulate the transaction in a Gremlin script and execute it as a single request.
Here's an example of how to do it -- it's from an example app I worked up last year for the Neo4j Heroku Challenge.
The project is called Lightbulb: https://github.com/espeed/lightbulb
The README describes what it does...
However, Neo4j quit offering Gremlin on their free/test Heroku Add On so Lightbulb won't work for new Neo4j/Heroku users.
Within the next year -- before the TinkerPop book comes out -- TinkerPop will release a Rexster Heroku Add On with full Gremlin support so people can run their projects on Heroku as they work their way through the book.
But for right now, you don't need to concern yourself with running the app -- all the relevant code is contained within these two files -- the Lightbulb app's model file and its Gremlin script file:
https://github.com/espeed/lightbulb/blob/master/lightbulb/model.py https://github.com/espeed/lightbulb/blob/master/lightbulb/gremlin.groovy
model.py
provides an example for building custom Bulbs models and a custom BulbsGraph
class.gremlin.groovy
contains a custom Gremlin script that the customEntry
model executes -- this Gremlin script encapsulates the entire multi-part transaction so that it can be executed as a single request.Notice in the
model.py
file above, I customizeEntryProxy
by overriding thecreate()
andupdate()
methods and instead define a singularsave()
method to handle creates and updates.To hook the custom
EntryProxy
into theEntry
model, I simply override theEntry
model'sget_proxy_class
method so that it returns theEntryProxy
class instead of the defaultNodeProxy
class.Everything else in the
Entry
model is designed around building up the data for thesave_blog_entry
Gremlin script (defined in the gremlin.groovy file above).Notice in gremlin.groovy that the
save_blog_entry()
method is long and contains several closures. You could define each closure as an independent method and execute them with multiple Python calls, but then you'd have the overhead of making multiple server requests and since the requests are separate, there would be no way to wrap them all in a transaction.By using a single Gremlin script, you combine everything into a single transactional request. This is much faster, and it's transactional.
You can see how the entire script is executed in the final line of the Gremlin method:
return transaction(save_blog_entry);
Here I'm simply wrapping a transaction closure around all the commands in internal
save_blog_entry
closure. Making a transaction closure keeps code isolated and is much cleaner than embedding the transaction logic into the other closures.Then if you look at the code in the internal
save_blog_entry
closure, it's just calling the other closures I defined above, using the params I passed in from Python when I called the script in theEntry
model:The params I pass in are built up in the model's custom
_get_parms()
method:Here's what's
_get_params()
is doing...buld_data(_data, kwds)
is a function defined inbulbs.element
: https://github.com/espeed/bulbs/blob/master/bulbs/element.py#L959It simply merges the args in case the user entered some as positional args and some as keyword args.
The first param I pass into
_get_params()
isauthor
, which is the author's username, but I don't pass the username to the Gremlin script, I pass theauthor_id
. Theauthor_id
is cached so I use the username to look up theauthor_id
and set that as a param, which I will later pass to the Gremlinsave_blog_entry
script.Then I create
Topic
Model
objects for each blog tag that was set, and I callget_bundle()
on each and save them as a list oftopic_bundles
in params.The
get_bundle()
method is defined in bulbs.model: https://github.com/espeed/bulbs/blob/master/bulbs/model.py#L363It simply returns a tuple containing the
data
,index_name
, and indexkeys
for the model instance:I added the
get_bundle()
method to Bulbs to provide a nice and tidy way of bundling params together so your Gremlin script doesn't get overrun with a ton of args in its signature.Finally, for
Entry
, I simply create anentry_bundle
and store it as the param.Notice that
_get_params()
returns adict
of three params:author_id
,topic_bundle
, andentry_bundle
.This
params
dict
is passed directly to the Gremlin script:And the Gremlin script has the same arg names as those passed in by
params
:The params are then simply used in the Gremlin script as needed -- nothing special going on.
So now that I've created my custom model and Gremlin script, I build a custom Graph object that encapsulates all the proxies and the respective models:
You can now import
Graph
directly from your app'smodel.py
and instantiate theGraph
object like normal.Does that help?