I need to transform my Java-pair-rdd to a csv :
so i m thinking to transform it to rdd, to solve my problem.
what i want is to have my rdd transformed
from :
Key Value
Jack [a,b,c]
to :
Key value
Jack a
Jack b
Jack c
i see that it is possible in that issue and in this issue(PySpark: Convert a pair RDD back to a regular RDD)
so i am asking how to do that in java?
Update of question
The Type of my JavaPairRdd is of Type :
JavaPairRDD<Tuple2<String,String>, Iterable<Tuple1<String>>>
and this is the form of row that contain :
((dr5rvey,dr5ruku),[(2,01/09/2013 00:09,01/09/2013 00:27,N,1,-73.9287262,40.75831223,-73.98726654,40.76442719,2,3.96,16,0.5,0.5,4.25,0,,21.25,1,)])
the key here is : (dr5rvey,dr5ruku)
and the value is [(2,01/09/2013 00:09,01/09/2013 00:27,N,1,-73.9287262,40.75831223,-73.98726654,40.76442719,2,3.96,16,0.5,0.5,4.25,0,,21.25,1,)]
my original JavaRdd was of type:
JavaRDD<String>
Understanding that the keys should be kept, you may use flatMapValues function :
Pass each value in the key-value pair RDD through a flatMap function without changing the keys; ...
JavaPairRDD<Tuple2<String,String>, Iterable<Tuple1<String>>> input = ...;
JavaPairRDD<Tuple2<String, String>, Tuple1<String>> output1 = input.flatMapValues(iter -> iter);
JavaPairRDD<Tuple2<String, String>, String> output2 = output1.mapValues(t1 -> t1._1());
If I understand correctly you need to use the function flat map, it enables you to create multiple rows from a single key, example in scala(just the idea youll need to change for your use case):
rdd.flatMap(arg0 => {
var list = List[Row]()
list = arg0._2.split(",")
list
})
Its a super simplified example but you should get the gist.
for rdd:
key val
mykey "a,b,c'
the returned rdd will be:
key val
mykey "a"
mykey "b"
mykey "c"
The type of your RDD
is RDD[(String, Array[String])]
if I am getting this right. So you can just apply flatMap on this RDD.
val rdd: RDD[(String, Array[String])] = ???
val newRDD = rdd.flatMap{case (key, array) => array.map(value => (key, value))}
newRDD
will be of type RDD[(String, String)]