I'm trying to import a medium data set of about 500,000 nodes into neo4j using cypher. I am running neo4j-community-2.0.0-M05 locally on my 3.4 GHz i7 iMac with SSD.
I am piping the cypher to neo4j shell, wrapping every 40k lines into a transaction.
I am using labels and before I started, I created indices on one property per labeled node.
When I left last night, MATCH CREATE UNIQUE were taking about 15ms each. This morning they are taking about 6000ms.
The slow queries looks something like this
MATCH n:Artifact WHERE n.pathId = 'ZZZ' CREATE UNIQUE n-[r:DEPENDS_ON]->(a:Artifact {pathId: 'YYY'}) RETURN a
1 row
5719 ms
pathId is indexed.
I understand this is a milestone build and probably not performance optimized. But I'm less than a third of the way through my import and it's slowing down more and more.
Should I look at some other methods than cypher to import this data?
I just want to answer my own question in case someone else finds this. Thanks to Peter for suggesting the batch import project. I used the 2.0 tree.
My workflow ended up being to (1) load all the data into a relational database, (2) clean up duplicates, and then (3) write a script to export the data into CSV files.
Using cypher, I had the import running for 24 hours before I killed it. Using the java import tool, the entire import took 11 seconds with neo4j-community-2.0.0-M06.
Bottom line: don't bother trying to write out cypher to import large chunks of data. Spend an hour cleaning up your data if necessary, then export to CSV and use the java batch import tool.