In a Pandas dataframe, I would like to filter out all the rows that have more than 2 NaN
s.
Essentially, I have 4 columns and I would like to keep only those rows where at least 2 columns have finite values.
Can somebody advise on how to achieve this?
The following should work
See the online docs
What we are doing here is dropping any
NaN
rows, where there are 2 or more nonNaN
values in a row.Example:
EDIT
For the above example it works but you should note that you would have to know the number of columns and set the
thresh
value appropriately, I thought originally it meant the number ofNaN
values but it actually means number of NonNaN
values.I had a slightly different problem i.e. to filter out columns with more than certain number of NaN:
Assume you want to filter out columns with 3 or more Nan's:
output: (column c has been dropped as expected):
You have phrased 2 slightly different questions here. In the general case, they have different answers.
This keeps rows with 2 or more non-null values.
This filters out rows with 2 or more null values.
In your example dataframe of 4 columns, these operations are equivalent, since
df.shape[1] - 2 == 2
. However, you will notice discrepancies with dataframes which do not have exactly 4 columns.Note
dropna
also has asubset
argument should you wish to include only specified columns when applying a threshold. For example: