I know that we can combine(like cbind in R) two RDDs as below in pyspark:
rdd3 = rdd1.zip(rdd2)
I want to perform the same for two Dstreams in pyspark. Is it possible or any alternatives?
In fact, I am using a MLlib randomforest model to predict using spark streaming. In the end, I want to combine the feature Dstream & prediction Dstream together for further downstream processing.
Thanks in advance.
-Obaid
In the end, I am using below.
The trick is using "native python map" along with "spark spreaming transform". May not an elegent way, however it works :).
Hope, it will help somebody who is facing same problem. If anybody has better idea, please post it here.
-Obaid
Note: I also submitted the problem on spark user list and post my answer there as well.