I have a data set of 3.8million nodes and I'm trying to load all of these into Neo4j spatial. The nodes are going into a simple point layer, so have the required latitude and longitude fields. I've tried:
MATCH (d:pointnode)
WITH collect(d) as pn
CALL spatial.addNodes("point_geom", pn) yield count return count
But this just keeps spinning without anything happening. I've also tried (I've been running the next query all on one line, but I've just split it up for ease of reading):
CALL apoc.periodic.iterate("MATCH (d:pointnode)
WITH collect(d) AS pnodes return pnodes",
"CALL spatial.addNodes('point_geom', pnodes) YIELD count return count",
{batchSize:10000, parallel:false, listIterate:true})
But again a lot of spinning and the occasional JAVA heap error.
The final approach I tried was to use FME with the HTTP caller, this works but is exceptionally slow so doesn't scale well for millions of nodes.
Any advice or suggestions would be much appreciated. Would apoc.periodic.commit or apoc.periodic.rock_n_roll be a better choice than periodic iterate?
After a bit of trial and error periodic commit has led to a relatively quick solution (still going to take 2-3 hours)
May be quicker with larger batch sizes
EDIT with a batch size of 5000 it takes 45 minutes
You have 3 800 000 nodes, you collect them in one list ... and then you do one call to have that list added to the layer ... that is going to take a while and eat loads of memory. apoc.periodic.iterate makes absolutely no difference because you only do one call to spatial.addNodes ...
It may take a while, but why not add them node by node ?
Hope this helps (or at least explains why you are having issues).
Regards, Tom