I am trying to load millions of nodes from CSV files to Titan 1.0.0 with Cassandra backend in JAVA. How to load them?
I checked we can load them using BulkLoaderVertexProgram, but it loads the data from GraphSON format.
How do I start writing a JAVA code to bulk load the data from CSV? Can you specify some starting reference where I can look into and start writing code?
Do I have to have Spark /Hadoop running on my system to use SparkComputerGraph which is used by Bulkloaderprogram?
I am not able to start writing code, as I am not understanding how to read data from CSV using bulkloderprogram. Can you provide some starting links to proceed for Java code?
Thanks.
You probably need a custom Java software to read your CSV files and load the graph with them.
If you want to use OGM, meaning you need to create a POJO classes as data model for your data, you could use Peapod to create a data model easily.
So this is an example
To load data, this is an example,
Easier than you thought? Hope so.
How about converting the csv into graphml and then loading it at once using gremlin
Wouldn't that be performant than making a gremlin call for each addVertex/addEdge ?
This was cross-posted on the Titan mailing list...
If you're looking to use Java code, check out Alex's and Matthew's Marvel graph example:
https://github.com/awslabs/dynamodb-titan-storage-backend/blob/1.0.0/src/main/java/com/amazon/titan/example/MarvelGraphFactory.java
It creates a Titan schema, parses a CSV, and then uses basic Gremlin addVertex() and addEdge() to build the graph. You'll notice that the TitanGraph isn't instantiated in the factory itself, so even though it is inside a Titan-DynamoDB example, you can use this with any Titan backend (Cassandra, HBase, Berkeley).
If your graph data is in the low millions, you could use a Titan-BerkeleyJE graph on your own machine, which might be an easier backend to use at first rather than a Cassandra cluster. I'd recommend that you do not get too caught up on loading a lot of data initially -- get comfortable with how to use Titan and TinkerPop with OLTP first and then move into OLAP approaches.