My question is very similar to this one: How to create unique nodes and relationships by csv file imported in neo4j? I have a textfile with around 2.5 million lines that has two columns, each one being node ids:
1234 345
1234 568
345 984
... ...
Each line represents a relationship (so 2.5 million relationships): first_column nodeid-> FOLLOWS -> second_column nodeid. There are around 80,000 unique nodes in this file.
Based on the link above, I did:
USING PERIODIC COMMIT 1000
LOAD CSV FROM 'file:///home/user_name/Desktop/bigfile.csv' AS line FIELDTERMINATOR ' '
MERGE (n:Userid { id: toInt(line[0]) })
WITH line, n
MERGE (m:Userid { id: toInt(line[1]) })
WITH m,n
MERGE (n)-[:FOLLOWS]->(m)
I am assuming this code
- creates node n or m if it doesn't exist (and finds it if it does exist), and creates a relationship from n to m.
- If n or m exists and already has many other edges (relationships) pointing to and from other nodes, this would just add another edge from n to m (not creating a brand new node when it already exists)
My main question is I am wondering how to make this process faster. This is being done on Ubuntu, and I changed the values from 512 to 2048 MB for memory in the conf/neo4j-wrapper.conf file. (maximum I can increase on my Virtual Machine)
Should I try doing the Import tool? Based on example on this website, neo4j.com/developer/guide-import-csv/ under "Super Fast Batch Importer For Huge Datasets",
./bin/neo4j-import --into mydatabase.db --id-type INTEGER \
--nodes allnodes.csv \
--delimiter " " \
--relationships:FOLLOWS bigfile.csv
And to do this, I need to reformat files so that: allnodes.csv shows
userID:ID(Userid)
1234
5678
...
And bigfile.csv shows
:START_ID(Userid) :END_ID(Userid)
1234 345
1234 568
345 984
*Two columns delimited by space*
And when I run this import, I get this error:
Input error: Expected '--nodes' to have at least 1 valid item, but had 0 []
Caused by:Expected '--nodes' to have at least 1 valid item, but had 0 []
java.lang.IllegalArgumentException: Expected '--nodes' to have at least 1 valid item, but had 0 []
How do I fix this error? And for the csv files, do I put them in same folder where I run this command (neo4j folder)?
Your command line probably has the wrong paths for your two CSV files.