I am curious to know, how can i implement sql like exists clause in spark Dataframe way.
可以将文章内容翻译成中文,广告屏蔽插件可能会导致该功能失效(如失效,请关闭广告屏蔽插件后再试):
问题:
回答1:
LEFT SEMI JOIN
is equivalent to the EXISTS
function in Spark.
val cityDF= Seq(("Delhi","India"),("Kolkata","India"),("Mumbai","India"),("Nairobi","Kenya"),("Colombo","Srilanka")).toDF("City","Country")
val CodeDF= Seq(("011","Delhi"),("022","Mumbai"),("033","Kolkata"),("044","Chennai")).toDF("Code","City")
val finalDF= cityDF.join(CodeDF, cityDF("City") === CodeDF("City"), "left_semi")
回答2:
If the data to be compared is small like a broadcasted list then you can use -
df.filter(col("columnName").isin(list...) === true)