I want to load a simple JSON schema in to my SparkSession which has employee with address array . The sample JSON is below
{"firstName":"Neil","lastName":"Irani", "addresses" : [ { "city" : "Brindavan", "state" : "NJ" }, { "city" : "Subala", "state" : "DT" }]}
I'm trying to create the schema for loading my JSON, I believe there is something wrong in the below way of creating schema ... please advise .. the below code is in Java ... I could not find a reasonable sample
List<StructField> employeeFields = new ArrayList<>();
employeeFields.add(DataTypes.createStructField("firstName", DataTypes.StringType, true));
employeeFields.add(DataTypes.createStructField("lastName", DataTypes.StringType, true));
employeeFields.add(DataTypes.createStructField("email", DataTypes.StringType, true));
List<StructField> addressFields = new ArrayList<>();
addressFields.add(DataTypes.createStructField("city", DataTypes.StringType, true));
addressFields.add(DataTypes.createStructField("state", DataTypes.StringType, true));
addressFields.add(DataTypes.createStructField("zip", DataTypes.StringType, true));
employeeFields.add(DataTypes.createStructField("addresses", DataTypes.createStructType(addressFields), true));
StructType employeeSchema = DataTypes.createStructType(employeeFields);
Dataset<Employee> rowDataset = sparkSession.read()
.option("inferSchema", "false")
.schema(employeeSchema)
.json("simple_employees.json").as(employeeEncoder);
Update
I was not creating the Array type the below code will work fine
List<StructField> employeeFields = new ArrayList<>();
employeeFields.add(DataTypes.createStructField("firstName", DataTypes.StringType, true));
employeeFields.add(DataTypes.createStructField("lastName", DataTypes.StringType, true));
employeeFields.add(DataTypes.createStructField("email", DataTypes.StringType, true));
List<StructField> addressFields = new ArrayList<>();
addressFields.add(DataTypes.createStructField("city", DataTypes.StringType, true));
addressFields.add(DataTypes.createStructField("state", DataTypes.StringType, true));
addressFields.add(DataTypes.createStructField("zip", DataTypes.StringType, true));
ArrayType addressStruct = DataTypes.createArrayType( DataTypes.createStructType(addressFields));
employeeFields.add(DataTypes.createStructField("addresses", addressStruct, true));
StructType employeeSchema = DataTypes.createStructType(employeeFields);