Use case is to read a file and create a dataframe on top of it.After that get the schema of that file and store into a DB table.
For example purpose I am just creating a case class and getting the printschema however I am unable create a dataframe out of it
Here is a sample code
case class Employee(Name:String, Age:Int, Designation:String, Salary:Int, ZipCode:Int)
val spark = SparkSession
.builder()
.appName("Spark SQL basic example")
.config("spark.master", "local")
.getOrCreate()
import spark.implicits._
val EmployeesData = Seq( Employee("Anto", 21, "Software Engineer", 2000, 56798))
val Employee_DataFrame = EmployeesData.toDF
val dfschema = Employee_DataFrame.schema
Now dfschema is a structype and wanted to convert it in a dataframe of two columns , how to achieve that
Try this -
Spark >= 2.4.0
In order to save the schema into a string format you can use the
toDDL
method of theStructType
. In your case the DDL format should be:After saving the schema you can load it from the database and use it as
StructType.fromDDL(my_schema)
this will return an instance of StructType which you can use to create the new dataframe withspark.createDataFrame
as @Ajay already mentioned.Also is useful to remember that you can always extract the schema given a case class with:
And then you can get the DDL representation with
empSchema.toDDL
.Spark < 2.4
For Spark < 2.4 use
DataType.fromDDL
andschema.simpleString
accordingly. Also instead of returning aStructType
you should use anDataType
instance omitting the cast to StructType as next:Sample output for empSchema.simpleString: