Spark: Sort an RDD by multiple values in a tuple /

2019-07-10 03:00发布

So I have an RDD as follows

RDD[(String, Int, String)]

And as an example

    ('b', 1, 'a')
    ('a', 1, 'b')
    ('a', 0, 'b')
    ('a', 0, 'a')

The final result should look something like

('a', 0, 'a')
('a', 0, 'b')
('a', 1, 'b')
('b', 1, 'a')

How would I do something like this?

1条回答
爱情/是我丢掉的垃圾
2楼-- · 2019-07-10 03:39

Try this:

rdd.sortBy(r => r)

If you wanted to switch the sort order around, you could do this:

rdd.sortBy(r => (r._3, r._1, r._2))

For reverse order:

rdd.sortBy(r => r, false)
查看更多
登录 后发表回答