I have a dataset of 10 fields. I need to perform RDD operations on these DataFrame. Is it possible to perform RDD operations like map
, flatMap
, etc..
here is my sample code:
df.select("COUNTY","VEHICLES").show();
this is my dataframe
and i need to convert this dataframe
to RDD
and operate some RDD operations on this new RDD.
Here is code how i am converted dataframe to RDD
RDD<Row> java = df.select("COUNTY","VEHICLES").rdd();
after converting to RDD, i am not able to see the RDD results, i tried
java.collect();
java.take(10);
java.foreach();
In all above cases i failed to get results.
please help me out.
Check the Spark Api documentation Dataset to RDD.
lazy val rdd: RDD[T]
In your case create the Dataframe with selected of record by performing select after that call
.rdd
it wil convert it to RDDTry persisting the rdd before reading the data from rdd.
For Spark 1.6 :
You won't be able to see the result's as when you are converting a
Dataframe
to a RDD what it does is it converts it intoRDD[Row]
And hence when you try any of these :
It would be resulting in
Array[Row]
and you are not able to get the results.Solution:
You can convert the Row to respective values and get the
RDD
out of it like here :And now you can apply the
foreach
andcollect
function to get the value.P.S.: The code is written in Scala , but you can get the essence of what I am trying to do !
Since spark 2.0 you can convert DataFrame to DataSet using
toDS
function in order to use RDD operations.Recommend this great article about mastering spark 2.0