I have:
key value
a [1,2,3]
b [2,3,4]
I want:
key value1 value2 value3
a 1 2 3
b 2 3 4
It seems that in scala I can write:df.select($"value._1", $"value._2", $"value._3")
, but it is not possible in python.
So is there a good way to do this?
It depends on the type of your "list":
If it is of type ArrayType()
:
df = hc.createDataFrame(sc.parallelize([['a', [1,2,3]], ['b', [2,3,4]]]), ["key", "value"])
df.printSchema()
df.show()
root
|-- key: string (nullable = true)
|-- value: array (nullable = true)
| |-- element: long (containsNull = true)
you can access the values like you would with python using []
:
df.select("key", df.value[0], df.value[1], df.value[2]).show()
+---+--------+--------+--------+
|key|value[0]|value[1]|value[2]|
+---+--------+--------+--------+
| a| 1| 2| 3|
| b| 2| 3| 4|
+---+--------+--------+--------+
+---+-------+
|key| value|
+---+-------+
| a|[1,2,3]|
| b|[2,3,4]|
+---+-------+
If it is of type StructType()
: (maybe you built your dataframe by reading a JSON)
df2 = df.select("key", psf.struct(
df.value[0].alias("value1"),
df.value[1].alias("value2"),
df.value[2].alias("value3")
).alias("value"))
df2.printSchema()
df2.show()
root
|-- key: string (nullable = true)
|-- value: struct (nullable = false)
| |-- value1: long (nullable = true)
| |-- value2: long (nullable = true)
| |-- value3: long (nullable = true)
+---+-------+
|key| value|
+---+-------+
| a|[1,2,3]|
| b|[2,3,4]|
+---+-------+
you can directly 'split' the column using *
:
df2.select('key', 'value.*').show()
+---+------+------+------+
|key|value1|value2|value3|
+---+------+------+------+
| a| 1| 2| 3|
| b| 2| 3| 4|
+---+------+------+------+
Oh, I know it. You can use getItem
.
df.select(df["value"].getItem(0).alias("value1"),
df["value"].getItem(1).alias("value2"),
df["value"].getItem(2).alias("value3"))
does the trick.