SparkSession.createDataset()
only allows List, RDD, or Seq
- but it doesn't support JavaPairRDD
.
So if I have a JavaPairRDD<String, User>
that I want to create a Dataset
from, would a viable workround for the SparkSession.createDataset()
limitation to create a wrapper UserMap
class that contains two fields: String
and User
.
Then do spark.createDataset(userMap, Encoders.bean(UserMap.class));
?
If you can convert the
JavaPairRDD
toList<Tuple2<K, V>>
then you can use createDataset method which takes List. See below sample code.or you can convert to RDD