I want to convert dataFrame to dataSet by using different case class.
Now, my code is like below.
case Class Views(views: Double)
case Class Clicks(clicks: Double)
def convertViewsDFtoDS(df: DataFrame){
df.as[Views]
}
def convertClicksDFtoDS(df: DataFrame){
df.as[Clicks]
}
So, my question is "Is there anyway I can use one general function to this by pass case class as extra parameter to this function?"
It seems a bit obsolete (as
method does exactly what you want) but you can
import org.apache.spark.sql.{Encoder, Dataset, DataFrame}
def convertTo[T : Encoder](df: DataFrame): Dataset[T] = df.as[T]
or
def convertTo[T](df: DataFrame)(implicit enc: Encoder[T]): Dataset[T] = df.as[T]
Both methods are equivalent and express exactly the same thing (existence of an implicit Encoder
for a type T
).
If you want to avoid implicit parameter you can use explicit Encoder
all the way down:
def convertTo[T](df: DataFrame, enc: Encoder[T]): Dataset[T] = df.as[T](enc)
convertTo(df, encoderFor[Clicks])