问题:

Is there any difference in semantics between df.na().drop() and df.filter(df.col("onlyColumnInOneColumnDataFrame").isNotNull() && !df.col("onlyColumnInOneColumnDataFrame").isNaN()) where df is Apache Spark Dataframe?

Or shall I consider it as a bug if the first one does NOT return afterwards null (not a String null, but simply a null value) in the column onlyColumnInOneColumnDataFrame and the second one does?

EDIT: added !isNaN() as well. The onlyColumnInOneColumnDataFrame is the only column in the given Dataframe. Let's say it's type is Integer.

回答1:

With df.na.drop() you drop the rows containing any null or NaN values.

With df.filter(df.col("onlyColumnInOneColumnDataFrame").isNotNull()) you drop those rows which have null only in the column onlyColumnInOneColumnDataFrame.

If you would want to achieve the same thing, that would be df.na.drop(["onlyColumnInOneColumnDataFrame"]).

回答2:

I do not know if you got your answer. But this should work:

df.na.drop(subset=["onlyColumnInOneColumnDataFrame"])

or even:

df.na.drop(how = 'any')

Difference between na().drop() and filter(col.isNo

问题:

回答1:

回答2:

收藏的人(0)

举报内容

检举类型

检举原因

检举说明(必填)

打开微信“扫一扫”，打开网页后点击屏幕右上角分享按钮