Vertices with complex values in Apache Giraph

2019-08-14 05:56发布

问题:

I am trying to read some text file containing relevant vertices information into Giraph: each line is

vertex_id attribute_1 attribute_2 .....attribute_n

where each attribute is a string.

The goal would be to create a vertex where all these attributes are part of vertex's value.

Looking up the various input formats I could not find anything out of the box, so I assume I have to derive my vertex input class from VertexValueInputFormat (I have a separate reader for edges).

Problem is: how? I have created a a Value class which contains a String[] array, but how do I hand it over to Giraph/Hadoop? Here is a reader for a single line:

https://giraph.apache.org/giraph-core/apidocs/org/apache/giraph/io/formats/TextVertexValueInputFormat.TextVertexValueReaderFromEachLine.html

protected abstract V getValue(org.apache.hadoop.io.Text line)

The thought was, V will be an ArrayWritable, but does not seem to like it.

Any clue? Thanks

回答1:

If your vertex has a custom value (in your case array of string), then you need to have a custom vertex value class and a custom vertex input format. As an example, take a look at a very simple custom vertex class. This class has a double value, an int, and a long : https://gist.github.com/sar-vivek/df09cca17cc3f6b5ac60 note - you must override readFields() and write() accordingly.

Then you need to have a custom vertex input format. For above vertex class, I have modified the in-built json vertex reader a little bit. Here is the example - https://gist.github.com/sar-vivek/f39edacec6d9a43c3717 [notice how the value of a vertex is set on line 68].