My data looks like this:
library(tidyverse)
df <- tribble(
~a, ~b, ~c,
1, 2, 3,
1, NA, 3,
NA, 2, 3
)
I can remove all NA
observations with drop_na()
:
df %>% drop_na()
Or remove all NA
observations in a single column (a
for example):
df %>% drop_na(a)
Why can't I just use a regular !=
filter pipe?
df %>% filter(a != NA)
Why do we have to use a special function from tidyr to remove NAs?
For example:
you can use:
df %>% filter(!is.na(a))
to remove the NA in column a.
From @Ben Bolker:
[T]his has nothing specifically to do with dplyr::filter()
From @Marat Talipov:
[A]ny comparison with NA, including NA==NA, will return NA
From a related answer by @farnsy:
The == operator does not treat NA's as you would expect it to.
Think of NA as meaning "I don't know what's there". The correct answer
to 3 > NA is obviously NA because we don't know if the missing value
is larger than 3 or not. Well, it's the same for NA == NA. They are
both missing values but the true values could be quite different, so
the correct answer is "I don't know."
R doesn't know what you are doing in your analysis, so instead of
potentially introducing bugs that would later end up being published
an embarrassing you, it doesn't allow comparison operators to think NA
is a value.