When estimating the mean with a vector of all NA's we get an NaN
if na.rm = TRUE
. Why is this, is this flawed logic or is there something I'm missing? Surely it would make more sense to use NA
than NaN
?
Quick example below
mean(NA, na.rm = TRUE)
#[1] NaN
mean(rep(NA, 10), na.rm = TRUE)
#[1] NaN
It is a bit pity that ?mean
does not say anything about this. My comment only told you that applying mean
on an empty "numeric" results in NaN
without more reasoning. Rui Barradas's comment tried to reason this but was not accurate, as division by 0
is not always NaN
, it can be Inf
or -Inf
. I once discussed about this in R: element-wise matrix division. However, we are getting close. Although mean(x)
is not coded by sum(x) / length(x)
, this mathematical fact really explains this NaN
.
From ?sum:
*NB:* the sum of an empty set is zero, by definition.
So sum(numeric(0))
is 0
. As length(numeric(0))
is 0
, mean(numeric(0))
is 0 / 0
which is NaN
.
From mean
documentation :
na.rm a logical value indicating whether NA values should be
stripped before the computation proceeds.
With this logic all NAs are removed before the function mean is applied. In your cases you are applying mean to nothing (all NAs are removed) so NaN is returned.