Dynamically build case class or schema

2019-08-16 15:05发布

Given a list of strings, is there a way to create a case class or a Schema without inputing the srings manually.

For eaxample, I have a List,

 val name_list=Seq("Bob", "Mike", "Tim")

The List will not always be the same. Sometimes it will contain different names and will vary in size.

I can create a case class

case class names(Bob:Integer, Mike:Integer, Time:Integer)

or a schema

 val schema = StructType(StructFiel("Bob", IntegerType,true)::
            StructFiel("Mike", IntegerType,true)::
            StructFiel("Tim", IntegerType,true)::Nil)

but I have to do it manually. I am looking for a method to perform this operation dynamically.

3条回答
时光不老,我们不散
2楼-- · 2019-08-16 15:38

If you have all the fields with same datatype than you can simply create as

val name_list=Seq("Bob", "Mike", "Tim")

val fields = name_list.map(name => StructField(name, IntegerType, true))

val schema = StructType(fields)

If you have different datatype than create a map of fields and type and create a schema as above.

Hope this helps!

查看更多
唯我独甜
3楼-- · 2019-08-16 15:39

Assuming the data type of the columns are the same:

import org.apache.spark.sql.functions._
import org.apache.spark.sql.types._

val nameList=Seq("Bob", "Mike", "Tim")

val schema = StructType(nameList.map(n => StructField(n, IntegerType, true)))
// schema: org.apache.spark.sql.types.StructType = StructType(
//   StructField(Bob,IntegerType,true), StructField(Mike,IntegerType,true), StructField(Tim,IntegerType,true)
// )

spark.createDataFrame(rdd, schema)

If the data types are different, you'll have to provide them as well (in which case it might not save much time compared with assembling the schema manually):

val typeList = Array[DataType](StringType, IntegerType, DoubleType)
val colSpec = nameList zip typeList

val schema = StructType(colSpec.map(cs => StructField(cs._1, cs._2, true)))
// schema: org.apache.spark.sql.types.StructType = StructType(
//   StructField(Bob,StringType,true), StructField(Mike,IntegerType,true), StructField(Tim,DoubleType,true)
// )
查看更多
够拽才男人
4楼-- · 2019-08-16 15:43

All the answers above only covered one aspect which is create the schema. Here is one solution you can use to create the case class from the generated schema: https://gist.github.com/yoyama/ce83f688717719fc8ca145c3b3ff43fd

查看更多
登录 后发表回答