My import.csv
creates many nodes and merging creates a huge cartesian product and runs in a transaction timeout
since the data has grown so much. I've currently set the transaction timeout to 1 second because every other query is very quick and is not supposed to take any longer than one second to finish.
Is there a way to split or execute this specific query in smaller chunks to prevent a timeout?
Upping or disabling the transaction timeout
in the neo4j.conf
is not an option because the neo4j service needs a restart for every change made in the config.
The query hitting the timeout from my import script:
MATCH (l:NameLabel)
MATCH (m:Movie {id: l.id,somevalue: l.somevalue})
MERGE (m)-[:LABEL {path: l.path}]->(l);
Nodecounts: 1000 Movie, 2500 Namelabel
You can try installing APOC Procedures and using the procedure apoc.periodic.commit.
call apoc.periodic.commit("
MATCH (l:Namelabel)
WHERE NOT (l)-[:LABEL]->(:Movie)
WITH l LIMIT {limit}
MATCH (m:Movie {id: l.id,somevalue: l.somevalue})
MERGE (m)-[:LABEL {path: l.path}]->(l)
RETURN count(*)
",{limit:1000})
The below query will be executed repeatedly in separate transactions until it returns 0.
You can change the value of {limit : 1000}
.
Note: remember to install APOC Procedures according the version of Neo4j you are using. Take a look in the Version Compatibility Matrix.
The number of nodes and labels in your database suggest this is an indexing problem. Do you have constraints on both the Movie and Namelabel (which should be NameLabel since it is a node) nodes? The appropriate constraints should be in place and active.
Indexing and Performance
Make sure to have indexes and constraints declared and ONLINE for
entities you want to MATCH or MERGE on
Always MATCH and MERGE on a
single label and the indexed primary-key property
Prefix your load
statements with USING PERIODIC COMMIT 10000 If possible, separate node
creation from relationship creation into different statements
If your
import is slow or runs into memory issues, see Mark’s blog post on
Eager loading.
If your Movie nodes have unique names then use the CREATE UNIQUE
statement. - docs
If one of the nodes is not unique but will be used in a relationship definition then the CREATE INDEX ON
statement. With such a small dataset it may not be readily apparent how inefficient your queries are. Try the PROFILE
command and see how many nodes are being searched. Your MERGE
statement should only check a couple nodes at each step.