I need to transform my Java-pair-rdd to a csv :
so i m thinking to transform it to rdd, to solve my problem.
what i want is to have my rdd transformed from :
Key Value
Jack [a,b,c]
to :
Key value
Jack a
Jack b
Jack c
i see that it is possible in that issue and in this issue(PySpark: Convert a pair RDD back to a regular RDD) so i am asking how to do that in java?
Update of question
The Type of my JavaPairRdd is of Type :
JavaPairRDD<Tuple2<String,String>, Iterable<Tuple1<String>>>
and this is the form of row that contain :
((dr5rvey,dr5ruku),[(2,01/09/2013 00:09,01/09/2013 00:27,N,1,-73.9287262,40.75831223,-73.98726654,40.76442719,2,3.96,16,0.5,0.5,4.25,0,,21.25,1,)])
the key here is : (dr5rvey,dr5ruku)
and the value is [(2,01/09/2013 00:09,01/09/2013 00:27,N,1,-73.9287262,40.75831223,-73.98726654,40.76442719,2,3.96,16,0.5,0.5,4.25,0,,21.25,1,)]
my original JavaRdd was of type:
JavaRDD<String>
Understanding that the keys should be kept, you may use flatMapValues function :
The type of your
RDD
isRDD[(String, Array[String])]
if I am getting this right. So you can just apply flatMap on this RDD.newRDD
will be of typeRDD[(String, String)]
If I understand correctly you need to use the function flat map, it enables you to create multiple rows from a single key, example in scala(just the idea youll need to change for your use case):
Its a super simplified example but you should get the gist.
for rdd:
the returned rdd will be: