Given a list of strings, is there a way to create a case class or a Schema without inputing the srings manually.
For eaxample, I have a List,
val name_list=Seq("Bob", "Mike", "Tim")
The List will not always be the same. Sometimes it will contain different names and will vary in size.
I can create a case class
case class names(Bob:Integer, Mike:Integer, Time:Integer)
or a schema
val schema = StructType(StructFiel("Bob", IntegerType,true)::
StructFiel("Mike", IntegerType,true)::
StructFiel("Tim", IntegerType,true)::Nil)
but I have to do it manually. I am looking for a method to perform this operation dynamically.
Assuming the data type of the columns are the same:
import org.apache.spark.sql.functions._
import org.apache.spark.sql.types._
val nameList=Seq("Bob", "Mike", "Tim")
val schema = StructType(nameList.map(n => StructField(n, IntegerType, true)))
// schema: org.apache.spark.sql.types.StructType = StructType(
// StructField(Bob,IntegerType,true), StructField(Mike,IntegerType,true), StructField(Tim,IntegerType,true)
// )
spark.createDataFrame(rdd, schema)
If the data types are different, you'll have to provide them as well (in which case it might not save much time compared with assembling the schema manually):
val typeList = Array[DataType](StringType, IntegerType, DoubleType)
val colSpec = nameList zip typeList
val schema = StructType(colSpec.map(cs => StructField(cs._1, cs._2, true)))
// schema: org.apache.spark.sql.types.StructType = StructType(
// StructField(Bob,StringType,true), StructField(Mike,IntegerType,true), StructField(Tim,DoubleType,true)
// )
If you have all the fields with same datatype than you can simply create as
val name_list=Seq("Bob", "Mike", "Tim")
val fields = name_list.map(name => StructField(name, IntegerType, true))
val schema = StructType(fields)
If you have different datatype than create a map
of fields and type and create a schema
as above.
Hope this helps!
All the answers above only covered one aspect which is create the schema. Here is one solution you can use to create the case class from the generated schema:
https://gist.github.com/yoyama/ce83f688717719fc8ca145c3b3ff43fd