Py2neo Neo4j Batch submit error

2019-08-01 02:42发布

问题:

I have a json file with data of around 1.4 million nodes and I wanted to construct a Neo4j graph database for that. I tried to use py2neo's batch submit function. My code is as follows:

# the variable words is a list containing node names
from py2neo import neo4j
batch = neo4j.WriteBatch(graph_db)
nodedict = {}
# I decided to use a dictionary because I would be creating relationships
# by referring to the dictionary entries later
for i in words:
    nodedict[i] = batch.create({"name":i})
results = batch.submit()

The error shown is as follows:

Traceback (most recent call last):
  File "test.py", line 36, in <module>
    results = batch.submit()
  File "/usr/lib/python2.6/site-packages/py2neo/neo4j.py", line 2116, in submit
    for response in self._submit()
  File "/usr/lib/python2.6/site-packages/py2neo/neo4j.py", line 2085, in _submit
    for id_, request in enumerate(self.requests)
  File "/usr/lib/python2.6/site-packages/py2neo/rest.py", line 427, in _send
    return self._client().send(request)
  File "/usr/lib/python2.6/site-packages/py2neo/rest.py", line 364, in send
    return Response(request.graph_db, rs.status, request.uri, rs.getheader("Loc$
  File "/usr/lib/python2.6/site-packages/py2neo/rest.py", line 278, in __init__
    raise SystemError(body)
SystemError: None

Can anybody please tell me what exactly is happening here? Does it have anything to do with the fact that the batch query is pretty large? If so, what can be done? Thanks in advance! :)

回答1:

So here's what I figured out (Thanks to this question: py2neo - Neo4j - System Error - Create Batch Nodes/Relationships):

The py2neo batch submit function has it's own limitations in terms of queries that can be made. While, I wasn't able to get a exact amount on the upper limit, I tried to limit my number of queries per batch to 5000. So I decided to run the following piece of code:

# the variable words is a list containing node names
from py2neo import neo4j
batch = neo4j.WriteBatch(graph_db)
nodedict = {}
# I decided to use a dictionary because I would be creating relationships
# by referring to the dictionary entries later

for index, i in enumerate(words):
    nodedict[i] = batch.create({"name":i})
    if index%5000 == 0:
        batch.submit()
        batch = neo4j.WriteBatch(graph_db) # As stated by Nigel below, I'm creating a new batch
batch.submit() #for the final batch

This way, I sent batch requests (of size 5k queries) and was successfully able to get my entire graph created!



回答2:

There's no real way to describe a limit on the number of jobs that a batch can contain - it can vary wildly based on a number of factors. The best bet in general is to experiment to find an optimum size for your use case and go with that. It looks like this is what you are already doing :-)

In terms of your solution, I'd recommend one tweak. Batch objects weren't designed to be reused so instead of clearing the batch after every submission, simply create a new one. The ability to submit a batch multiple times will be removed in the next version of py2neo anyway.



回答3:

I had the same issue after I started using batch create via graph.create(*alist). The above answers pointed me in the right direction and I ended up using this snippet inspired by https://gist.github.com/anonymous/6293739 from this question py2neo - Neo4j - System Error - Create Batch Nodes/Relationships

chunk_size=500
chunks=(alist[pos:pos + chunk_size] for pos in xrange(0, len(alist), chunk_size))
for c in chunks:
    graph.create(*c)

PS py2neo==2.0.7