How to load a large csv file into Neo4j

2019-08-02 08:43发布

问题:

I'm trying to load a large csv file (1458644 row) into neo4j, but i'm still getting this error :

Neo.TransientError.General.OutOfMemoryError: There is not enough memory to perform the current task. Please try increasing 'dbms.memory.heap.max_size' in the neo4j configuration (normally in 'conf/neo4j.conf' or, if you you are using Neo4j Desktop, found through the user interface) or if you are running an embedded installation increase the heap by using '-Xmx' command line flag, and then restart the database.

even if i change dbms.memory.heap.max_size=1024m with m=megbite , the same error occurs again !

Note : the size of the csv is 195.888 KB

this is my code :

load csv with headers from "file:///train.csv" as line
create(pl:pickup_location{latitude:toFloat(line.pickup_latitude),longitude:toFloat(line.pickup_longitude)}),(pt:pickup_time{pickup:line.pickup_datetime}),(dl:dropoff_location{latitude:toFloat(line.dropoff_latitude),longitude:toFloat(line.dropoff_longitude)}),(dt:dropoff_time{dropoff:line.dropoff_datetime})
create (pl)-[:TLR]->(pt),(dl)-[:TLR]->(dt),(pl)-[:Trip]->(dl);

what should i do ?

回答1:

You should use periodic commits to process the CSV data in batches. For example, this will process 10,000 lines at a time (the default batch size is 1000):

USING PERIOD COMMIT 10000
LOAD CSV WITH HEADERS FROM "file:///train.csv" as line
CREATE (pl:pickup_location{latitude:toFloat(line.pickup_latitude),longitude:toFloat(line.pickup_longitude)}),(pt:pickup_time{pickup:line.pickup_datetime}),(dl:dropoff_location{latitude:toFloat(line.dropoff_latitude),longitude:toFloat(line.dropoff_longitude)}),(dt:dropoff_time{dropoff:line.dropoff_datetime})
CREATE (pl)-[:TLR]->(pt),(dl)-[:TLR]->(dt),(pl)-[:Trip]->(dl);


回答2:

I Solved the problem by copying the solution for limited row here

so this is my solution:

USING PERIODIC COMMIT
load csv with headers from "file:///train.csv" as line
with line LIMIT 1458644
create   (pl:pickup_location{latitude:toFloat(line.pickup_latitude),longitude:toFloat(line.pickup_longitude)}),(pt:pickup_time{pickup:line.pickup_datetime}),(dl:dropoff_location{latitude:toFloat(line.dropoff_latitude),longitude:toFloat(line.dropoff_longitude)}),(dt:dropoff_time{dropoff:line.dropoff_datetime})
create (pl)-[:TLR]->(pt),(dl)-[:TLR]->(dt),(pl)-[:Trip]->(dl);

the downside of this solution is that you need to know the number of rows of your big csv file (excel can't open large csv files).



标签: neo4j cypher