Giraph best's Vertex Input format, for an inpu

I have a multinode giraph cluster working properly in my PC. I executed the SimpleShortestPathExample from Giraph and was executed fine.

This algorithm was ran with this file (tiny_graph.txt):

[0,0,[[1,1],[3,3]]]
[1,0,[[0,1],[2,2],[3,1]]]
[2,0,[[1,2],[4,4]]]
[3,0,[[0,3],[1,1],[4,4]]]
[4,0,[[3,4],[2,4]]]

This file has the following input format:

[source_id,source_value,[[dest_id, edge_value],...]]

Now, I’m trying to execute this same algorithm, in this same cluster, but with an input file different from the original. My own file is like this:

[Portada,0,[[Sugerencias para la cita del día,1]]]
[Proverbios españoles,0,[]]
[Neil Armstrong,0,[[Luna,1][ideal,1][verdad,1][Categoria:Ingenieros,2,[Categoria:Estadounidenses,2][Categoria:Astronautas,2]]]
[Categoria:Ingenieros,1,[[Neil Armstrong,2]]]
[Categoria:Estadounidenses,1,[[Neil Armstrong,2]]]
[Categoria:Astronautas,1,[[Neil Armstrong,2]]]

It's very similar to the original, but the id's are String and the vertex and edges values are Long. My question it's which TextInputFormat should i use for this, because i already try with org.apache.giraph.io.formats.JsonLongDoubleFloatDoubleVertexInputFormat and org.apache.giraph.io.formats.TextDoubleDoubleAdjacencyListVertexInputFormat and i couldn't get this working.

With this problem solved, i could adapt the original shortest path example algorithm and let it work for my file, but until i get a solution for this i can't reach to that point.

If this format it's not a good decision, i could adapt it maybe, but i don't know which it's my best option, my knowledge from Text Input and Output Format in giraph it's really bad, that's why i0me here asking for advice.

标签： hadoop giraph

2条回答

等我变得足够好

2楼-- · 2019-08-06 03:03

I solved this adapting my own file to fit in org.apache.giraph.io.formats.TextDoubleDoubleAdjacencyListVertexInputFormat . My original file should be like this:

Portada 0.0     Sugerencias     1.0
Proverbios      0.0
Neil    0.0     Luna    1.0     ideal   1.0     verdad  1.0     Categoria:Ingenieros    2.0     Categoria:Estadounidenses       2.0     Categoria:Astronautas   2.0
Categoria:Ingenieros    1.0     Neil    2.0
Categoria:Estadounidenses       1.0     Neil    2.0
Categoria:Astronautas   1.0     Neil    2.0

Those spaces between the data are tab spaces ('\t'), because this format has that option as predetermined token value for spliting the original lines into several strings.

Thanks @masoud-sagharichian for your help anyway!! :D

0人赞添加讨论(0) 举报

女痞

3楼-- · 2019-08-06 03:07

It's better to write your own inputformat. I suggest use hash codes of your strings. I write a sample code such that each line consists of: [vertex_id (integer e.g. hash code of your string), vertex_val (long), [[neighbor_id (integer), neighbor_val (long)], ....]

public class JsonIntLongIntLongVertexInputFormat extends
  TextVertexInputFormat<IntWritable, LongWritable, LongWritable> {

  @Override
  public TextVertexReader createVertexReader(InputSplit split,
      TaskAttemptContext context) {
    return new JsonIntLongIntLongVertexReader();
  }


  class JsonIntLongIntLongVertexReader extends
    TextVertexReaderFromEachLineProcessedHandlingExceptions<JSONArray,
    JSONException> {

    @Override
    protected JSONArray preprocessLine(Text line) throws JSONException     {
      return new JSONArray(line.toString());
    }

    @Override
    protected IntWritable getId(JSONArray jsonVertex) throws JSONException,
              IOException {
      return new IntWritable(jsonVertex.getString(0).hashCode());
    }

    @Override
    protected LongWritable getValue(JSONArray jsonVertex) throws
      JSONException, IOException {
      return new LongWritable(jsonVertex.getLong(1));
    }

    @Override
    protected Iterable<Edge<IntWritable, LongWritable>> getEdges(
        JSONArray jsonVertex) throws JSONException, IOException {
      JSONArray jsonEdgeArray = jsonVertex.getJSONArray(2);
      List<Edge<IntWritable, LongWritable>> edges =
          Lists.newArrayListWithCapacity(jsonEdgeArray.length());
      for (int i = 0; i < jsonEdgeArray.length(); ++i) {
        JSONArray jsonEdge = jsonEdgeArray.getJSONArray(i);
        edges.add(EdgeFactory.create(new IntWritable(jsonEdge.getString(0).hashCode()),
            new LongWritable(jsonEdge.getLong(1))));
      }
      return edges;
    }

    @Override
    protected Vertex<IntWritable, LongWritable, LongWritable>
    handleException(Text line, JSONArray jsonVertex, JSONException e) {
      throw new IllegalArgumentException(
          "Couldn't get vertex from line " + line, e);
    }

  }
}

0人赞添加讨论(0) 举报

Giraph best's Vertex Input format, for an inpu

采纳回答

编辑标签

举报内容

检举类型

检举原因

检举说明(必填)

打开微信“扫一扫”，打开网页后点击屏幕右上角分享按钮

付费偷看金额在0.1-10元之间