I am batch loading a neo4j graph using py2neo using this script:
batch = neo4j.WriteBatch(graph)
counter = 0
for each in ans:
n1 = graph.merge_one("Page", "url", each[0])
# batch.create(n1)
counter +=1
for linkvalue in each[6]:
try:
text,link = linkvalue.split('!__!')
n2 = graph.merge_one("Page", "url", link)
# batch.create(n2)
counter+=1
rel = Relationship(n1,'LINKS',n2, anchor_text=text)
batch.create(rel)
except (KeyboardInterrupt, SystemExit):
print 'fail'
raise
if counter > 900:
counter = 0
batch.submit()
print 'submit'
batch = neo4j.WriteBatch(graph)
The merge_one's both make a call to the graph, which I believe is slowing down my algorithm. I commented out the batch.create() because they were recreating the nodes. Is there a way to do this function but save it until I batch.submit() to speed up the process?
I am handling about 50,000 nodes and 1,000,000 relationships.
You need to append statements to the
WriteBatch
and thenrun
the batch once it reaches some number of statements.Here's an example:
Note that this example uses only Cypher statements and appends each statement to the
WriteBatch
. Also this example is using two differentWriteBatch
instances.