How to drop rows with nulls in one column pyspark

2020-02-25 22:52发布

I have a dataframe and I would like to drop all rows with NULL value in one of the columns (string). I can easily get the count of that:

df.filter(df.col_X.isNull()).count()

I have tried dropping it using following command. It executes but the count still returns as positive

df.filter(df.col_X.isNull()).drop()

I tried different attempts but it returns 'object is not callable' error.

标签： apache-spark pyspark spark-dataframe

4条回答

Anthone

2楼-- · 2020-02-25 23:10

Use either drop with subset:

df.na.drop(subset=["col_X"])

or isNotNull()

df.filter(df.col_X.isNotNull())

0人赞添加讨论(0) 举报

虎瘦雄心在

3楼-- · 2020-02-25 23:17

Dataframes are immutable. so just applying a filter that removes not null values will create a new dataframe which wouldn't have the records with null values.

df = df.filter(df.col_X. isNotNull())

0人赞添加讨论(0) 举报

Explosion°爆炸

4楼-- · 2020-02-25 23:21

you can add empty string condition also somtimes

df = df.filter(df.col_X. isNotNull() | df.col_X != "")

0人赞添加讨论(0) 举报

Bombasti

5楼-- · 2020-02-25 23:28

another variation is:

from pyspark.sql.functions import col

df = df.where(col("columnName").isNotNull())

0人赞添加讨论(0) 举报

How to drop rows with nulls in one column pyspark

采纳回答

编辑标签

举报内容

检举类型

检举原因

检举说明(必填)

打开微信“扫一扫”，打开网页后点击屏幕右上角分享按钮

付费偷看金额在0.1-10元之间