My RDD is made of many items, each of which is a tuple as follows:
(key1, (val1_key1, val2_key1))
(key2, (val1_key2, val2_key2))
(key1, (val1_again_key1, val2_again_key1))
... and so on
I used GroupByKey on the RDD which gave the result as
(key1, [(val1_key1, val2_key1), (val1_again_key1, val2_again_key1), (), ... ()])
(key2, [(val1_key2, val2_key2), (), () ... ())])
... and so on
I need to do the same using reduceByKey. I tried doing
RDD.reduceByKey(lambda val1, val2: list(val1).append(val2))
but it doesn't work.
Please suggest the right way to implement using reduceByKey()