in R does NA == NA?

2020-05-06 07:46发布

问题:

identical(NA, NA) returns TRUE, but the following code filters NA out of the date frame:

library(tidyverse)
filter(starwars, birth_year == birth_year)

If NA does equal NA the starwars filtered data frame above should include birth years of NA. Why doesn't it?

回答1:

NA is identical to NA, but doesn't equal it. If you run NA==NA, the response will be NA, because the equal operator doesn't apply to NAs. From the identical documentation:

A call to identical is the way to test exact equality in if and while statements, as well as in logical expressions that use && or ||. In all these applications you need to be assured of getting a single logical value.

Users often use the comparison operators, such as == or !=, in these situations. It looks natural, but it is not what these operators are designed to do in R. They return an object like the arguments. If you expected x and y to be of length 1, but it happened that one of them was not, you will not get a single FALSE. Similarly, if one of the arguments is NA, the result is also NA. In either case, the expression if(x == y).... won't work as expected.

And from the documentation for ==:

Missing values (NA) and NaN values are regarded as non-comparable even to themselves, so comparisons involving them will always result in NA. Missing values can also result when character strings are compared and one is not valid in the current collation locale.

The rationale is that missing values, at a conceptual level, are not the same as one another. They could potentially represent very different values, but we just don't know what those values are.

An alternative in this situation is to add | is.na(birth_year).



标签: r na