sorry for asking a simple question. I want to pass a case class to a function argument and I want to use it further inside the function. Till now I have tried this with TypeTag
and ClassTag
but for some reason, I am unable to properly use it or may be I am not looking at the correct place.
Use cases is something similar to this:
case class infoData(colA:Int,colB:String)
case class someOtherData(col1:String,col2:String,col3:Int)
def readCsv[T:???](path:String,passedCaseClass:???): Dataset[???] = {
sqlContext
.read
.option("header", "true")
.csv(path)
.as[passedCaseClass]
}
It will be called something like this:
val infoDf = readCsv("/src/main/info.csv",infoData)
val otherDf = readCsv("/src/main/someOtherData.csv",someOtherData)
First change your function definition to:
You don´t need to perform any kind of reflection to create a generic readCsv function. The key here is that Spark needs the encoder at compile time. So you can pass it as implicit parameter and the compiler will add it.
Because Spark SQL can deserialize product types(your case classes) including the default encoders, it is easy to call your function like:
Hope it helps
There are two things which you should pay attention to,
CamelCase
, soInfoData
.DataSet
, its not aDataFrame
.DataFrame
is a special name for aDataSet
of general purposeRow
.What you need is to ensure that your provided class has an implicit instance of corresponding
Encoder
in current scope.Encoder
instances for primitive types (Int
,String
, etc) andcase classes
can be obtained by importingspark.implicits._
Or, you can use context bound,
Now, you can use it as following,