Neo4j long lasting query to be split/executed in s

2019-08-03 16:39发布

My import.csv creates many nodes and merging creates a huge cartesian product and runs in a transaction timeout since the data has grown so much. I've currently set the transaction timeout to 1 second because every other query is very quick and is not supposed to take any longer than one second to finish.

Is there a way to split or execute this specific query in smaller chunks to prevent a timeout?

Upping or disabling the transaction timeout in the neo4j.conf is not an option because the neo4j service needs a restart for every change made in the config.

The query hitting the timeout from my import script:

 MATCH (l:NameLabel)
 MATCH (m:Movie {id: l.id,somevalue: l.somevalue})
 MERGE (m)-[:LABEL {path: l.path}]->(l);

Nodecounts: 1000 Movie, 2500 Namelabel

2条回答
做个烂人
2楼-- · 2019-08-03 16:57

You can try installing APOC Procedures and using the procedure apoc.periodic.commit.

call apoc.periodic.commit("
  MATCH (l:Namelabel)
  WHERE NOT (l)-[:LABEL]->(:Movie)
  WITH l LIMIT {limit}
  MATCH (m:Movie {id: l.id,somevalue: l.somevalue})
  MERGE (m)-[:LABEL {path: l.path}]->(l)
  RETURN count(*)
",{limit:1000})

The below query will be executed repeatedly in separate transactions until it returns 0.

You can change the value of {limit : 1000}.

Note: remember to install APOC Procedures according the version of Neo4j you are using. Take a look in the Version Compatibility Matrix.

查看更多
萌系小妹纸
3楼-- · 2019-08-03 16:58

The number of nodes and labels in your database suggest this is an indexing problem. Do you have constraints on both the Movie and Namelabel (which should be NameLabel since it is a node) nodes? The appropriate constraints should be in place and active.

Indexing and Performance

Make sure to have indexes and constraints declared and ONLINE for entities you want to MATCH or MERGE on

Always MATCH and MERGE on a single label and the indexed primary-key property

Prefix your load statements with USING PERIODIC COMMIT 10000 If possible, separate node creation from relationship creation into different statements

If your import is slow or runs into memory issues, see Mark’s blog post on Eager loading.

If your Movie nodes have unique names then use the CREATE UNIQUE statement. - docs

If one of the nodes is not unique but will be used in a relationship definition then the CREATE INDEX ON statement. With such a small dataset it may not be readily apparent how inefficient your queries are. Try the PROFILE command and see how many nodes are being searched. Your MERGE statement should only check a couple nodes at each step.

查看更多
登录 后发表回答