Closely related to: Spark Dataframe column with last character of other column
but I want to extract multiple characters from the -1
index.
I have the following pyspark dataframe df
+----------+----------+
| number|event_type|
+----------+----------+
|0342224022| 11|
|0112964715| 11|
+----------+----------+
I want to extract 3 characters from the last index of the number
column.
I tried the following:
from pyspark.sql.functions import substring
df.select(substring(df['number'], -1, 3), 'event_type').show(2)
# which returns:
+----------------------+----------+
|substring(number,-1,3)|event_type|
+----------------------+----------+
| 2| 11|
| 5| 11|
+----------------------+----------+
The below is the expected output (and I'm not sure what the output above is):
+----------------------+----------+
|substring(number,-1,3)|event_type|
+----------------------+----------+
| 022| 11|
| 715| 11|
+----------------------+----------+
What am I doing wrong?
Note: Spark version 1.6.0