I have a JavaRDD which looks like this.,
[
[A,8]
[B,3]
[C,5]
[A,2]
[B,8]
...
...
]
I want my result to be Mean
[
[A,5]
[B,5.5]
[C,5]
]
How do I do this using Java RDDs only. P.S : I want to avoid groupBy operation so I am not using DataFrames.
Here you go :
You can use reduceByKey and calculate sum and count per key and then divide them for each key as follows.