Get dataframe schema load to metadata table

Use case is to read a file and create a dataframe on top of it.After that get the schema of that file and store into a DB table.

For example purpose I am just creating a case class and getting the printschema however I am unable create a dataframe out of it

Here is a sample code

case class Employee(Name:String, Age:Int, Designation:String, Salary:Int, ZipCode:Int)

val spark = SparkSession
.builder()
.appName("Spark SQL basic example")
.config("spark.master", "local")
.getOrCreate()

import spark.implicits._
val EmployeesData = Seq( Employee("Anto",   21, "Software Engineer", 2000, 56798))
val Employee_DataFrame = EmployeesData.toDF
val dfschema = Employee_DataFrame.schema

Now dfschema is a structype and wanted to convert it in a dataframe of two columns , how to achieve that

标签： scala apache-spark

2条回答

够拽才男人

2楼-- · 2020-04-30 03:00

Try this -

//-- For local file
val rdd = spark.read.option("wholeFile", true).option("delimiter",",").csv(s"file:///file/path/file.csv").rdd

val schema = StructType(Seq(StructField("Name", StringType, true),
                            StructField("Age", IntegerType, true),
                            StructField("Designation", StringType, true),
                            StructField("Salary", IntegerType, true),
                            StructField("ZipCode", IntegerType, true)))

val df = spark.createDataFrame(rdd,schema)

0人赞添加讨论(0) 举报

forever°为你锁心

3楼-- · 2020-04-30 03:03

Spark >= 2.4.0

In order to save the schema into a string format you can use the toDDL method of the StructType. In your case the DDL format should be:

`Name` STRING, `Age` INT, `Designation` STRING, `Salary` INT, `ZipCode` INT

After saving the schema you can load it from the database and use it as StructType.fromDDL(my_schema) this will return an instance of StructType which you can use to create the new dataframe with spark.createDataFrame as @Ajay already mentioned.

Also is useful to remember that you can always extract the schema given a case class with:

import org.apache.spark.sql.catalyst.ScalaReflection
val empSchema = ScalaReflection.schemaFor[Employee].dataType.asInstanceOf[StructType]

And then you can get the DDL representation with empSchema.toDDL.

Spark < 2.4

For Spark < 2.4 use DataType.fromDDL and schema.simpleString accordingly. Also instead of returning a StructType you should use an DataType instance omitting the cast to StructType as next:

val empSchema = ScalaReflection.schemaFor[Employee].dataType

Sample output for empSchema.simpleString:

struct<Name:string,Age:int,Designation:string,Salary:int,ZipCode:int>

0人赞添加讨论(0) 举报

Get dataframe schema load to metadata table

采纳回答

编辑标签

举报内容

检举类型

检举原因

检举说明(必填)

打开微信“扫一扫”，打开网页后点击屏幕右上角分享按钮

付费偷看金额在0.1-10元之间