I have started working with Node4j and I was exploring a bit the batch processing, but unfortunately, I am having some problems in creating relations between nodes.
My problem is the following. I have a list of websites and users that I read from a file. I may have repeated websites and users in that file, so I do not want to insert new nodes for those repeated entries. But as the file is big, I want to batch the processing of the nodes and relations.
Basically, I have these two functions to create nodes and relations and add them to the batch.
graph_db = neo4j.GraphDatabaseService("http://localhost:7474/db/data/")
batch = neo4j.WriteBatch(graph_db)
def create_node(pvalue, svalue, type):
return batch.create({\
"pkey" : pvalue,
"skey" : svalue,
"type" : type
}
)
def create_rel(from_node, type_label, to_node, fields):
properties =\
{"ACCT_KEY": fields.ACCT_KEY}
relation = rel(from_node, type_label, to_node, **properties)
batch.create(relation)
Then, after using a dictionary to make sure I have not created the nodes before, I do:
node1 = create_node("ATTRIBUTE_1", "ATTRIBUTE_2", "WEBSITE")
node2 = create_node("ATTRIBUTE_3", "ATTRIBUTE_4", "USER")
create_rel(node1, "VISITED_BY", node2, fields)
I save the references to "node1" and "node2" in a dictionary, so when I want to create a relation involving a website or a user that has already been registered, I will not create the node again, but use directly the reference. I do this inside a loop and it works fine, till I decide to do this after a certain number of iterations:
batch.submit()
batch.clear()
When I decide to use those references from previous batches, I get the following error:
Traceback (most recent call last):
File "main.py", line 102, in <module>
create_rel(cardholder, fraud_label, merchant,fields)
File "main.py", line 33, in create_rel
batch.create(relation)
File "/usr/local/lib/python2.7/dist-packages/py2neo/neo4j.py", line 2775, in create
"to": self._uri_for(entity.end_node)
File "/usr/local/lib/python2.7/dist-packages/py2neo/neo4j.py", line 2613, in _uri_for
uri = "{{{0}}}".format(self.find(resource))
File "/usr/local/lib/python2.7/dist-packages/py2neo/neo4j.py", line 2604, in find
raise ValueError("Request not found")
ValueError: Request not found
I believe that this happens because it somehow loses the references from the previous batches and they are no longer valid. I have tried to collect the IDs from the nodes and use those instead, but I cannot find how to do it. Any help would be appreciated, thanks.
My Node4j version is "2.0.3 community edition for Unix" and py2neo version 1.6.4.