I am trying to read some text file containing relevant vertices information into Giraph: each line is
vertex_id attribute_1 attribute_2 .....attribute_n
where each attribute is a string.
The goal would be to create a vertex where all these attributes are part of vertex's value.
Looking up the various input formats I could not find anything out of the box, so I assume I have to derive my vertex input class from VertexValueInputFormat (I have a separate reader for edges).
Problem is: how? I have created a a Value class which contains a String[] array, but how do I hand it over to Giraph/Hadoop? Here is a reader for a single line:
protected abstract V getValue(org.apache.hadoop.io.Text line)
The thought was, V will be an ArrayWritable, but does not seem to like it.
Any clue? Thanks
If your vertex has a custom value (in your case array of string), then you need to have a custom vertex value class and a custom vertex input format. As an example, take a look at a very simple custom vertex class. This class has a
double
value, anint
, and along
: https://gist.github.com/sar-vivek/df09cca17cc3f6b5ac60 note - you must overridereadFields()
andwrite()
accordingly.Then you need to have a custom vertex input format. For above vertex class, I have modified the in-built json vertex reader a little bit. Here is the example - https://gist.github.com/sar-vivek/f39edacec6d9a43c3717 [notice how the value of a vertex is set on line 68].