PySpark reduceByKey on multiple values

2020-04-26 04:35发布

问题:

If I have a K,V pair that is like:

(K, (v1, v2))
(K, (v3, v4))

How can I sum up the values such that I get (k, (v1 + v3, v2 + v4)) ?

回答1:

reduceByKey supports functions. Lets say A is the array of the Key-Value pairs.

output = A.reduceByKey(lambda x, y: x[0]+y[0], x[1]+y[1])


标签: pyspark