Support for User Defined Types for java in Spark

2019-07-21 15:17发布

问题:

Is there support for UDT for java in spark?

Does JavaSQLContext support User Defined Types(UDTs) when converting JavaRDD to JavaSchemaRDD?

If yes, is there any sample to demonstrate the capability.

回答1:

Yes, the simplest way is to have it inferred via reflection. See the SparkSQL documentation and click on the Java tab. Then, read the section labelled

Inferring the Schema Using Reflection

Edit from comments

I'm not sure that the Java API is as easily fleshed out as the scala one, so it seems in order to nest types you may need to build the schema yourself:

//First create the address
List<StructField> addressFields = new ArrayList<StructField>();
fields.add(DataType.createStructField("street", DataType.StringType, true));    
StructType addressStruct = DataType.createStructType(addressFields)

//Then create the person, using the address struct
List<StructField> personFields = new ArrayList<StructField>();
fields.add(DataType.createStructField("name", DataType.StringType, true));
fields.add(DataType.createStructField("age", DataType.IntType, true));
fields.add(DataType.createStructField("address", addressStruct, true));

StructType schema = DataType.createStructType(fields);