How to do an initial batch import of CSV / MySQL d

I am considering replacing a MySQL database with a neo4j database. I am a complete beginner with neo4j and would like to know how to go about doing a batch insert of my current MySQL data into the neo4j database so i can experiment and begin to learn about neo4j.

the relational database consists of 4 tables: Person, Organism, Story, Links. Links describes relationships between rows in the other 3 tables.

Links: ID, FromTable, FromID, ToTable, ToID, LinkType

Person: ID, property_2, property_1, etc ...

Organism: ID, property_A, property_B, etc ....

Story: ID, property_x, property_y

each ID field is an auto incrementing integer starting from 1 for each table

In case it is not obvious, a link between say person with ID 3 and a story with ID 42 would have a row in the Links table ID=autoincrement, FromTable=Person, FromID=3, ToTable=Story, ToID=42. Even though I am using the terms 'from' and 'to' the actual links are not really 'directed' in practice.

I have looked at Michael Hunger's batch-import but that seems to only work with a single table of nodes and one table of relationships, whereas I am looking to import three different types of nodes and one list of relationships between them.

I have got neo4j up and running, Any advice to get me started would be greatly appreciated.

I am not familiar with Java, though I do use Python and bash shell scripts. After initial import, I will be using the RESTful interface with Javascript.

标签： database import converter neo4j graph-databases

2条回答

在下西门庆

2楼-- · 2019-03-21 13:42

Based on advice in the git repo. Using Michael Hunger's batch-import it is possible to import multiple node types from the one .csv file. To quote Michael:

Just put them all into one nodes file, you can have any attribute not having a value in a certain row, it will then just be skipped.

So the general approach i used was:

combine all the nodes tables into a new table called nodes:

Create a new table nodes with an auto incrementing newID field and a type field. the type field will record what table the node data came from
Add all the possible columns names from the 3 node tables allowing nulls.
INSERT INTO nodes the values from Person, then Organism, then Story, in addition to setting the type field to person, organism, or story. Leave any unrelated fields blank.

in another new table rels add the newly created newID indexes to the Links table based on a sql JOIN:

INSERT INTO rels
SELECT  
    n1.newID AS fromNodeID, 
    n2.newID AS toNodeID,
    L.LinkType,
    L.ID
FROM 
    Links L
LEFT JOIN 
    nodes n1 
    ON 
    L.fromID = n1.ID 
    AND 
    L.fromType = n1.type
LEFT JOIN 
    nodes n2 
    ON 
    L.toID = n2.ID 
    AND 
    L.toType = n2.type;

Then export these two new tables nodes and rels as Tab seperated .csv files, and use them with batch-import:

$java -server -Xmx4G -jar target/batch-import-jar-with-dependencies.jar target/graph.db nodes.csv rels.csv

0人赞添加讨论(0) 举报

闹够了就滚

3楼-- · 2019-03-21 13:47

As you say that you are happy working with python and shell scripts, you may also want to have a look at the command line tools which come with py2neo, in particular geoff. This uses a flat file format I put together for holding graph data so in your instance, you would need to build a flat file from your source data and insert this into your graph database.

The file format and server plugin are documented here and the py2neo module for the client application is here.

If anything is missing from the docs or you want more information about this then feel free to drop me an email

Nigel

0人赞添加讨论(0) 举报

How to do an initial batch import of CSV / MySQL d

采纳回答

编辑标签

举报内容

检举类型

检举原因

检举说明(必填)

打开微信“扫一扫”，打开网页后点击屏幕右上角分享按钮

付费偷看金额在0.1-10元之间