Is there a way to concatenate datasets of two different RDD
s in spark?
Requirement is - I create two intermediate RDDs using scala which has same column names, need to combine these results of both the RDDs and cache the result for accessing to UI. How do I combine the datasets here?
RDDs are of type spark.sql.SchemaRDD
I think you are looking for
RDD.union
Example (on Spark-shell)
I had the same problem. To combine by row instead of column use unionAll:
I found it after reading the method summary for data frame. More information at: https://spark.apache.org/docs/latest/api/java/org/apache/spark/sql/DataFrame.html