I want to add a new map type column to a dataframe, like this:
|-- cMap: map (nullable = true)
| |-- key: string
| |-- value: string (valueContainsNull = true)
I tried the code:
df.withColumn("cMap", lit(null).cast(MapType)).printSchema
The error is :
<console>:132: error: overloaded method value cast with alternatives:
(to: String)org.apache.spark.sql.Column <and>
(to: org.apache.spark.sql.types.DataType)org.apache.spark.sql.Column
cannot be applied to (org.apache.spark.sql.types.MapType.type)
Is there other way to cast the new column to Map or MapType? Thanks
Unlike other types, MapType
isn't an object you can just use as-is (it's not an object extending DataType
), you have to call MapType.apply(...)
which expects the key and value types as arguments (and returns an instance of the MapType
class):
df.withColumn("cMap", lit(null).cast(MapType(StringType, StringType)))
I had the same problem, finally I found solution:
df.withColumn("cMap", typedLit(Map.empty[String, String]))
From ScalaDocs for typedLit
:
The difference between this function and [[lit]] is that this function can handle parameterized scala types e.g.: List, Seq and Map.
You can be as much Scala as in the other answer(s) or use a little trick with stringified types.
val withMapCol = df.withColumn("cMap", lit(null) cast "map<string, string>")
scala> withMapCol.printSchema
root
|-- id: long (nullable = false)
|-- cMap: map (nullable = true)
| |-- key: string
| |-- value: string (valueContainsNull = true)
You can use any type that Spark SQL supports this way (that you can see in the code here):
dataType
: complex=ARRAY '<' dataType '>' #complexDataType
| complex=MAP '<' dataType ',' dataType '>' #complexDataType
| complex=STRUCT ('<' complexColTypeList? '>' | NEQ) #complexDataType
| identifier ('(' INTEGER_VALUE (',' INTEGER_VALUE)* ')')? #primitiveDataType