I have two rdds that I need to join them together. They look like the followings:
RDD1
[(u'2', u'100', 2),
(u'1', u'300', 1),
(u'1', u'200', 1)]
RDD2
[(u'1', u'2'), (u'1', u'3')]
My desired output is:
[(u'1', u'2', u'100', 2)]
So I would like to select those from RDD2 that have the same second value of RDD1. I have tried join and also cartesian and none is working and not getting even close to what I am looking for. I am new to Spark and would appreciate any help from you guys.
Thanks
Dataframe If you allow using Spark Dataframe in the solution. You can turn given RDD to dataframes and join the corresponding column together.
RDD just zip the key that you want to join to the first element and simply use
join
to do the joiningFor me your process looks like manual. Here is sample code:-
OUTPUT:-