I have two RDDS :
rdd1 [String,String,String]: Name, Address, Zipcode
rdd2 [String,String,String]: Name, Address, Landmark
I am trying to join these 2 RDDs using the function : rdd1.join(rdd2)
But I am getting an error :
error: value fullOuterJoin is not a member of org.apache.spark.rdd.RDD[String]
The join should join the RDD[String] and the output RDD should be something like :
rddOutput : Name,Address,Zipcode,Landmark
And I wanted to save these files as a JSON file in the end.
Can someone help me with the same ?
As said in the comments, you have to convert your RDDs to PairRDDs before joining, which means that each RDD must be of type
RDD[(key, value)]
. Only then you can perform the join by the key. In your case, the key is composed by (Name, Address), so you you would have to do something like:More info about PairRDD functions in the Spark's Scala API documentation