How to import a CSV file into Titan graph database

2019-01-23 21:49发布

问题:

Can anyone supply some sample code or hints on how to import a 1MB CSV of nodes, and another 1MB CSV of edges, into Titan graph database running on Cassandra?

I've got small CSV files importing via Gremlin, but this doesn't seem appropriate for large files.

I've seen Faunus can do this, but I'd like to avoid spending a couple of days setting it up if possible.

It looks like BatchGraph might be the way to go (https://github.com/tinkerpop/blueprints/wiki/Batch-Implementation) but the example appears to be incomplete.

回答1:

My question was answered at https://groups.google.com/forum/#!topic/aureliusgraphs/ew9PJVxa8Xw :

1) The gremlin script is fine for a 1mb import (Stephen Mallette)

2) BatchGraph code (Daniel Kuppitz)

Prerequisties:

echo "alice,32"         > /tmp/vertices.csv
echo "bob,33"          >> /tmp/vertices.csv
echo "alice,knows,bob"  > /tmp/edges.csv

In Gremlin REPL:

config = new BaseConfiguration()
config.setProperty("storage.backend", "inmemory")

g = TitanFactory.open(config)
bg = new BatchGraph(g, VertexIDType.STRING, 1000)

new File("/tmp/vertices.csv").each({ line ->
  (username, age) = line.split(",")
  user = bg.addVertex("user::" + username)
  ElementHelper.setProperties(user, ["username":username,"age":age.toInteger()])
})

new File("/tmp/edges.csv").each({ line ->
  (source, label, target) = line.split(",")

  v1 = bg.getVertex("user::" + source)
  v2 = bg.getVertex("user::" + target)

  bg.addEdge(null, v1, v2, label)
})

bg.commit()