R: Why does mean(NA, na.rm = TRUE) return NaN

2019-05-27 09:46发布

问题:

When estimating the mean with a vector of all NA's we get an NaN if na.rm = TRUE. Why is this, is this flawed logic or is there something I'm missing? Surely it would make more sense to use NA than NaN?

Quick example below

mean(NA, na.rm = TRUE)
#[1] NaN

mean(rep(NA, 10), na.rm = TRUE)
#[1] NaN

回答1:

It is a bit pity that ?mean does not say anything about this. My comment only told you that applying mean on an empty "numeric" results in NaN without more reasoning. Rui Barradas's comment tried to reason this but was not accurate, as division by 0 is not always NaN, it can be Inf or -Inf. I once discussed about this in R: element-wise matrix division. However, we are getting close. Although mean(x) is not coded by sum(x) / length(x), this mathematical fact really explains this NaN.

From ?sum:

 *NB:* the sum of an empty set is zero, by definition.

So sum(numeric(0)) is 0. As length(numeric(0)) is 0, mean(numeric(0)) is 0 / 0 which is NaN.



回答2:

From mean documentation :

na.rm a logical value indicating whether NA values should be stripped before the computation proceeds.

With this logic all NAs are removed before the function mean is applied. In your cases you are applying mean to nothing (all NAs are removed) so NaN is returned.



标签: r nan mean na