How to add empty map type column to DataFrame?

2019-02-14 15:44发布

问题:

I want to add a new map type column to a dataframe, like this:

|-- cMap: map (nullable = true)
|    |-- key: string
|    |-- value: string (valueContainsNull = true)

I tried the code:

df.withColumn("cMap", lit(null).cast(MapType)).printSchema

The error is :

<console>:132: error: overloaded method value cast with alternatives:
(to: String)org.apache.spark.sql.Column <and>
(to: org.apache.spark.sql.types.DataType)org.apache.spark.sql.Column
cannot be applied to (org.apache.spark.sql.types.MapType.type)

Is there other way to cast the new column to Map or MapType? Thanks

回答1:

Unlike other types, MapType isn't an object you can just use as-is (it's not an object extending DataType), you have to call MapType.apply(...) which expects the key and value types as arguments (and returns an instance of the MapType class):

df.withColumn("cMap", lit(null).cast(MapType(StringType, StringType))) 


回答2:

I had the same problem, finally I found solution:

df.withColumn("cMap", typedLit(Map.empty[String, String])) 

From ScalaDocs for typedLit:

The difference between this function and [[lit]] is that this function can handle parameterized scala types e.g.: List, Seq and Map.



回答3:

You can be as much Scala as in the other answer(s) or use a little trick with stringified types.

val withMapCol = df.withColumn("cMap", lit(null) cast "map<string, string>")
scala> withMapCol.printSchema
root
 |-- id: long (nullable = false)
 |-- cMap: map (nullable = true)
 |    |-- key: string
 |    |-- value: string (valueContainsNull = true)

You can use any type that Spark SQL supports this way (that you can see in the code here):

dataType
    : complex=ARRAY '<' dataType '>'                            #complexDataType
    | complex=MAP '<' dataType ',' dataType '>'                 #complexDataType
    | complex=STRUCT ('<' complexColTypeList? '>' | NEQ)        #complexDataType
    | identifier ('(' INTEGER_VALUE (',' INTEGER_VALUE)* ')')?  #primitiveDataType