Assuming that I have the following RDDs:
a = sc.parallelize([1, 2, 5, 3])
b = sc.parallelize(['a','c','d','e'])
How do I combine these 2 RDD to one RDD which would be like this:
[('a', 1), ('c', 2), ('d', 5), ('e', 3)]
Using a.union(b)
just combines them in a list. Any idea?
You probably just want to
b.zip(a)
both RDDs (note the reversed order since you want to key by b's values).Just read the python docs carefully: