I want to create a linked list using Cypher over REST.
If I create the head of the list using the following query:
MERGE (headNode:HEAD {list:"mylist"})
WITH headNode
MERGE headNode-[:LINK]->(headNode)
RETURN headNode
And then do the insert using this query:
MERGE (headNode:HEAD {list:"mylist"})-[old:LINK]->after
DELETE old
CREATE headNode-[:LINK]->(newNode:LINKNODE { number : {nodeNumber} })-[:LINK]->after
Then everything is fine as long as I don't run multiple insert queries in parallel. But when I do I start to get consequences that are bad. I either get (depending on the timing) an error like:
Error: Relationship 391112 not found
Or I get multiple linked lists snaking out of the head node. I have set up a test node.js project that replicates the problem here.
How does one create a linked list in Neo4j that can handle parallel insertion?
I recommend not trying to second guessing how the locking works. Let's instead build something with the better-documented tools and features at our disposal.
Say we have a unique constraint like this:
create constraint on (n:LINK) assert n.list_head_id is unique
;
Then we can concurrently insert at the head of a list (potentially creating it in the process) like this:
merge (current_head:LINK {list_head_id: {list_id}})
on create set current_head.is_sentinel = true
remove current_head.list_head_id
create (new_head:LINK {list_head_id: {list_id}})-[:NEXT]->current_head
;
This kind of list always ends with a sentinel link, so you need to ignore that when you query it.
Now, the above can produce deadlock exceptions that will fail the transaction. Potentially a lot of them. So you need to make sure to retry those transactions that fail with deadlocks.
I wrote my own test in Java to demonstrate this approach. Note that in this test, I introduce a small random delay between transactions to reduce the probability of deadlocks. This is because the test is using an embedded database, so there’s no network or anything to otherwise limit the amount and speed of the concurrency I can throw at it: https://gist.github.com/chrisvest/9033600
In order to do parallel insert you need to take a lock on the node(s) in question:
MATCH (headNode:HEAD {list:"mylist"})-[old:LINK]->after
REMOVE headNode._lock_
REMOVE after._lock_
DELETE old
CREATE headNode-[:LINK]->(newNode:LINKNODE { number : {nodeNumber} })-[:LINK]->after
RETURN headNode, newNode
the lines
REMOVE headNode._lock_
REMOVE after._lock_
remove a non-existent property, when a property is removed a lock is taken. Now if the query can not insert due to the lock you will get a Neo.DatabaseError.Statement.ExecutionFailure which means the insert failed and you can re-run the query.