I want to sort my K,V tuples by V, i.e. by the value. I know that TakeOrdered
is good for this if you know how many you need:
b = sc.parallelize([('t',3),('b',4),('c',1)])
Using TakeOrdered:
b.takeOrdered(3,lambda atuple: atuple[1])
Using Lambda
b.map(lambda aTuple: (aTuple[1], aTuple[0])).sortByKey().map(
lambda aTuple: (aTuple[0], aTuple[1])).collect()
I've checked out the question here, which suggests the latter. I find it hard to believe that takeOrdered
is so succinct and yet it requires the same number of operations as the Lambda
solution.
Does anyone know of a simpler, more concise Transformation in spark to sort by value?