How to load millions of vertices from CSV into Tit

I am trying to load millions of nodes from CSV files to Titan 1.0.0 with Cassandra backend in JAVA. How to load them?

I checked we can load them using BulkLoaderVertexProgram, but it loads the data from GraphSON format.

How do I start writing a JAVA code to bulk load the data from CSV? Can you specify some starting reference where I can look into and start writing code?

Do I have to have Spark /Hadoop running on my system to use SparkComputerGraph which is used by Bulkloaderprogram?

I am not able to start writing code, as I am not understanding how to read data from CSV using bulkloderprogram. Can you provide some starting links to proceed for Java code?

Thanks.

标签： graph-databases titan bulkloader

3条回答

傲

2楼-- · 2020-04-20 06:11

You probably need a custom Java software to read your CSV files and load the graph with them.

If you want to use OGM, meaning you need to create a POJO classes as data model for your data, you could use Peapod to create a data model easily.

So this is an example

@Vertex
public abstract class Person {
  public abstract String getName();
  public abstract void setName(String name);

  public abstract List<Knows> getKnows();
  public abstract Knows getKnows(Person person);
  public abstract Knows addKnows(Person person);
  public abstract Knows removeKnows(Person person);
}

@Edge
public abstract class Knows {
  public abstract void setYears(int years);
  public abstract int getYears();
}

To load data, this is an example,

FramedGraph g=new FramedGraph(TitanFactory.open("path_to_prop_file"));
Person person1=g.addVertex(Person.class);
person.setName("M-T-A");

Person person2=g.addVertex(Person.class);
person2.setName("Amnesiac");

Knows pKnowsP2=person.addKnows(person1);
pKnowsP2.setYears(1);

Easier than you thought? Hope so.

0人赞添加讨论(0) 举报

叛逆

3楼-- · 2020-04-20 06:12

How about converting the csv into graphml and then loading it at once using gremlin

g = TitanFactory.open('bin/cassandra.local')  
gremlin> g.loadGraphML('data/graph-of-the-gods.xml')
gremlin> g.commit()

Wouldn't that be performant than making a gremlin call for each addVertex/addEdge ?

0人赞添加讨论(0) 举报

爷的心禁止访问

4楼-- · 2020-04-20 06:29

This was cross-posted on the Titan mailing list...

If you're looking to use Java code, check out Alex's and Matthew's Marvel graph example:

https://github.com/awslabs/dynamodb-titan-storage-backend/blob/1.0.0/src/main/java/com/amazon/titan/example/MarvelGraphFactory.java

It creates a Titan schema, parses a CSV, and then uses basic Gremlin addVertex() and addEdge() to build the graph. You'll notice that the TitanGraph isn't instantiated in the factory itself, so even though it is inside a Titan-DynamoDB example, you can use this with any Titan backend (Cassandra, HBase, Berkeley).

If your graph data is in the low millions, you could use a Titan-BerkeleyJE graph on your own machine, which might be an easier backend to use at first rather than a Cassandra cluster. I'd recommend that you do not get too caught up on loading a lot of data initially -- get comfortable with how to use Titan and TinkerPop with OLTP first and then move into OLAP approaches.

0人赞添加讨论(0) 举报

How to load millions of vertices from CSV into Tit

采纳回答

编辑标签

举报内容

检举类型

检举原因

检举说明(必填)

打开微信“扫一扫”，打开网页后点击屏幕右上角分享按钮

付费偷看金额在0.1-10元之间