identical(NA, NA)
returns TRUE
, but the following code filters NA
out of the date frame:
library(tidyverse)
filter(starwars, birth_year == birth_year)
If NA
does equal NA
the starwars filtered data frame above should include birth years of NA
. Why doesn't it?
NA is identical
to NA, but doesn't equal it. If you run NA==NA
, the response will be NA, because the equal operator doesn't apply to NAs. From the identical
documentation:
A call to identical is the way to test exact equality in if and while
statements, as well as in logical expressions that use && or ||. In
all these applications you need to be assured of getting a single
logical value.
Users often use the comparison operators, such as == or !=, in these
situations. It looks natural, but it is not what these operators are
designed to do in R. They return an object like the arguments. If you
expected x and y to be of length 1, but it happened that one of them
was not, you will not get a single FALSE. Similarly, if one of the
arguments is NA, the result is also NA. In either case, the expression
if(x == y).... won't work as expected.
And from the documentation for ==
:
Missing values (NA) and NaN values are regarded as non-comparable even
to themselves, so comparisons involving them will always result in NA.
Missing values can also result when character strings are compared and
one is not valid in the current collation locale.
The rationale is that missing values, at a conceptual level, are not the same as one another. They could potentially represent very different values, but we just don't know what those values are.
An alternative in this situation is to add | is.na(birth_year)
.