My import.csv
creates many nodes and merging creates a huge cartesian product and runs in a transaction timeout
since the data has grown so much. I've currently set the transaction timeout to 1 second because every other query is very quick and is not supposed to take any longer than one second to finish.
Is there a way to split or execute this specific query in smaller chunks to prevent a timeout?
Upping or disabling the transaction timeout
in the neo4j.conf
is not an option because the neo4j service needs a restart for every change made in the config.
The query hitting the timeout from my import script:
MATCH (l:NameLabel)
MATCH (m:Movie {id: l.id,somevalue: l.somevalue})
MERGE (m)-[:LABEL {path: l.path}]->(l);
Nodecounts: 1000 Movie, 2500 Namelabel
You can try installing APOC Procedures and using the procedure apoc.periodic.commit.
The below query will be executed repeatedly in separate transactions until it returns 0.
You can change the value of
{limit : 1000}
.Note: remember to install APOC Procedures according the version of Neo4j you are using. Take a look in the Version Compatibility Matrix.
The number of nodes and labels in your database suggest this is an indexing problem. Do you have constraints on both the Movie and Namelabel (which should be NameLabel since it is a node) nodes? The appropriate constraints should be in place and active.
If your Movie nodes have unique names then use the
CREATE UNIQUE
statement. - docsIf one of the nodes is not unique but will be used in a relationship definition then the
CREATE INDEX ON
statement. With such a small dataset it may not be readily apparent how inefficient your queries are. Try thePROFILE
command and see how many nodes are being searched. YourMERGE
statement should only check a couple nodes at each step.