Write JavaPairRdd to Csv

2019-10-05 14:19发布

问题:

JavaPairRdd has saveAsTextfile function, with which you can save data in a text format.

However what I need is to save the data as CSV file, so I can use it later with Neo4j.

My question is:

How to save the JavaPairRdd 's data in CSV format? Or is there a way to transform the rdd from :

Key   Value
Jack  [a,b,c]

to:

Key  value
 Jack  a
 Jack  b
 Jack  c

回答1:

You should use the flatMapValues function on your JavaPairRdd: Pass each value in the key-value pair RDD through a flatMap function without changing the keys; this also retains the original RDD's partitioning.

Just by returning the value it will create a line per element in the input lists preserving the keys.

  // In Java
  JavaPairRDD<Object, List<String>> input = ...;
  JavaPairRDD<Object, String> output = input.flatMapValues((Function<List<String>, Iterable<String>>) Functions.identity());