Following is the action I'm trying to achieve:
types = ["200","300"]
def Count(ID):
cnd = F.when((**F.col("type") in types**), 1).otherwise(F.lit(0))
return F.sum(cnd).alias("CountTypes")
The syntax in bold is not correct, any suggestions how to get the right syntax here for PySpark?
I'm not sure about what you are trying to achieve but here is the correct syntax :
types = ["200","300"]
from pyspark.sql import functions as F
cnd = F.when(F.col("type").isin(types),F.lit(1)).otherwise(F.lit(0))
sum_on_cnd = F.sum(cnd).alias("count_types")
# Column<b'sum(CASE WHEN (type IN (200, 300)) THEN 1 ELSE 0 END) AS `count_types`'>