I am dealing with transforming SQL code to PySpark code and came across some SQL statements. I don't know how to approach case statments in pyspark? I am planning on creating a RDD and then using rdd.map and then do some logic checks. Is that the right approach? Please help!
Basically I need to go through each line in the RDD or DF and based on some logic I need to edit one of the column values.
case
when (e."a" Like 'a%' Or e."b" Like 'b%')
And e."aa"='BW' And cast(e."abc" as decimal(10,4))=75.0 Then 'callitA'
when (e."a" Like 'b%' Or e."b" Like 'a%')
And e."aa"='AW' And cast(e."abc" as decimal(10,4))=75.0 Then 'callitB'
else
'CallitC'
Im not good in python. But will try to give some pointers of what I have done in scala.
Its one approach.
withColumn
is another approachDataFrame.withColumn
method in pySpark supports adding a new column or replacing existing columns of the same name.In this context you have to deal with
Column
via - spark udf or when otherwise syntaxfor example :
you can use udf instead of
when
otherwise
as well.