pyspark conditions on multiple columns and returni

2020-08-04 09:40发布

问题:

I am using spark 2.1 and scripting is pyspark. Please help me with this as I am stuck up here .

Problem statement: To create new columns based on conditions on multiple columns

Input dataframe is below

FLG1 FLG2 FLG3

T     F     T

F     T     T

T     T     F

Now I need to create one new column as FLG and my conditions would be like if FLG1==T&&(FLG2==F||FLG2==T) my FLG has to be T else F

Considered above dataframe as DF

below is my code snippet which was tried

DF.withColumn("FLG",DF.select(when(FLG1=='T' and (FLG2=='F' or FLG2=='T','F').otherwise('T'))).show()

Didn't work I was getting name when is not defined

Please help me in crossing this hurdle

回答1:

Try the following, it should work

from pyspark.sql.functions import col, when, lit
DF.withColumn("FLG", when((col("FLG1")=='T') & ((col("FLG2")=='F') | (col("FLG2")=='T')),lit('F')).otherwise(lit('T'))).show()