How can I cross combine (is this the correct way to describe?) the two RDDS?
input:
rdd1 = [a, b]
rdd2 = [c, d]
output:
rdd3 = [(a, c), (a, d), (b, c), (b, d)]
I tried rdd3 = rdd1.flatMap(lambda x: rdd2.map(lambda y: (x, y))
, it complains that It appears that you are attempting to broadcast an RDD or reference an RDD from an action or transformation.
. I guess that means you can not nest action
as in the list comprehension, and one statement can only do one action
.
You can use the cartesian transformation. Here's an example from the documentation:
in your case, you'll do
rdd3 = rdd1.cartesian(rdd2)
So as you have noticed you can't perform a
transformation
inside anothertransformation
(note thatflatMap
&map
aretransformations
rather thanactions
since they return RDDs). Thankfully, what your trying to accomplish is directly supported by another transformation in the Spark API - namelycartesian
(see http://spark.apache.org/docs/latest/api/python/pyspark.html#pyspark.RDD ).So you would want to do
rdd1.cartesian(rdd2)
.