py2neo - Neo4j - System Error - Create Batch Nodes

2019-08-04 12:43发布

问题:

Attempting to batch create nodes & relationships - batch creation is failing - Traceback at end of the post

Note code functions with smaller subset of nodes - fails when get into massive number of relationships, unclear at what limit this is occurring.

  • Wondering if I need to increase ulimit above 40,000 open files
  • Read somewhere where persons were running into Xstream issues with REST API while conducting batch create - unclear if the problem set is on the py2neo end of the spectrum, or on the Neo4j server tuning/configuration, or on the Python end of the spectrum. Any guidance would be greatly appreciated.

One cluster within the data set ends up with around 625525 relationships out of 700+ nodes. Total Relationships will be 1M+ - utilizing an Apple Macbook Pro Retina with x86_64 - Ubuntu 13.04, SSD, 8GB memory.

  • Neo4j: configured auto_indexing & auto_relationships set to ON
  • Nodes Clustered/Grouped via Python Panadas DataFrame.groupby()
  • Nodes: contain 3 properties
  • Relationships Properties: 1 -> IN & Out Relationships created
  • ulimit set to 40,000 files open

Code

https://github.com/alienone/OSINT/blob/master/MANDIANTAPT/spitball.py

  • Operating System: Ubuntu 13.04
  • Python version: 2.7.5
  • py2neo Version: 1.5.1
  • Java version: 1.7.0_25-b15
  • Neo4j version: Community Edition 1.9.2

Traceback

Traceback (most recent call last): File "/home/alienone/Programming/Python/OSINT/MANDIANTAPT/spitball.py", line 63, in main() File "/home/alienone/Programming/Python/OSINT/MANDIANTAPT/spitball.py", line 59, in main graph_db.create(*sorted_nodes) File "/home/alienone/.pythonbrew/pythons/Python-2.7.5/lib/python2.7/site-packages/py2neo/neo4j.py", line 420, in create return batch.submit() File "/home/alienone/.pythonbrew/pythons/Python-2.7.5/lib/python2.7/site-packages/py2neo/neo4j.py", line 2123, in submit for response in self._submit() File "/home/alienone/.pythonbrew/pythons/Python-2.7.5/lib/python2.7/site-packages/py2neo/neo4j.py", line 2092, in submit for id, request in enumerate(self.requests) File "/home/alienone/.pythonbrew/pythons/Python-2.7.5/lib/python2.7/site-packages/py2neo/rest.py", line 428, in _send return self._client().send(request) File "/home/alienone/.pythonbrew/pythons/Python-2.7.5/lib/python2.7/site-packages/py2neo/rest.py", line 365, in send return Response(request.graph_db, rs.status, request.uri, rs.getheader("Location", None), rs_body) File "/home/alienone/.pythonbrew/pythons/Python-2.7.5/lib/python2.7/site-packages/py2neo/rest.py", line 279, in init raise SystemError(body) SystemError: None

Process finished with exit code 1

回答1:

I had a similar issue. One way to deal with it is to do the batch.submit() for chunks of your data and not the whole data set. This is slower of course, but splitting one million nodes in chunks of 5000 is still faster than adding every node separately.

I use a small helper class to do this, note that all my nodes are indexed: https://gist.github.com/anonymous/6293739