apply() is giving NA values for every column

2019-02-14 19:28发布

问题:

I've been having this strange problem with apply lately. Consider the following example:

set.seed(42)
df <- data.frame(cars, foo = sample(LETTERS[1:5], size = nrow(cars), replace = TRUE))
head(df)
  speed dist foo
1     4    2   E
2     4   10   E
3     7    4   B
4     7   22   E
5     8   16   D
6     9   10   C

I want to use apply to apply a function fun (say, mean) on each column of that data.frame. If the data.frame is containing only numeric values, I do not have any problem:

apply(cars, 2, mean)
speed  dist 
15.40 42.98 

But when trying with my data.frame containing numeric and character data, it seem to fail:

apply(df, 2, mean)
speed  dist   foo 
   NA    NA    NA 
Warning messages:
1: In mean.default(newX[, i], ...) :
  argument is not numeric or logical: returning NA
2: In mean.default(newX[, i], ..) :
  argument is not numeric or logical: returning NA                 
3: In mean.default(newX[, i], ...) :                              
  argument is not numeric or logical: returning NA

Of course, I was expecting to get NA for the character column, but I would like to get values for the numeric columns anyway.

sapply(df, class)
    speed      dist       foo 
"numeric" "numeric"  "factor" 

Any pointers would be appreciated as I'm feeling like I'm missing something very obvious here!

> sessionInfo()
R version 2.14.1 (2011-12-22)
Platform: x86_64-unknown-linux-gnu (64-bit)

locale:
 [1] LC_CTYPE=en_GB.UTF-8       LC_NUMERIC=C              
 [3] LC_TIME=en_GB.UTF-8        LC_COLLATE=en_GB.UTF-8    
 [5] LC_MONETARY=en_GB.UTF-8    LC_MESSAGES=en_GB.UTF-8   
 [7] LC_PAPER=C                 LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C            
[11] LC_MEASUREMENT=en_GB.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base

回答1:

The first sentence of the description for ?apply says:

If X is not an array but an object of a class with a non-null dim value (such as a data frame), apply attempts to coerce it to an array via as.matrix if it is two-dimensional (e.g., a data frame) or via as.array.

Matrices can only be of a single type in R. When the data frame is coerced to a matrix, everything ends up as a character if there is even a single character column.

I guess I owe you an description of an alternative, so here you go. data frames are really just lists, so if you want to apply a function to each column, use lapply or sapply instead.



回答2:

apply works on a matrix, and a matrix must be of all one type. So df is being transformed into a matrix, and since it contains a character, all the columns are becoming character.

> apply(df, 2, class)
      speed        dist         foo 
"character" "character" "character" 

To get what you want, check out the colwise and numcolwise functions in plyr.

> numcolwise(mean)(df)
  speed  dist
1  15.4 42.98


回答3:

You are applying a function over the columns of a data.frame. Since a data.frame is a list, you can use lapply or sapply instead of apply:

sapply(df, mean)

speed  dist   foo 
15.40 42.98    NA 
Warning message:
In mean.default(X[[3L]], ...) :
  argument is not numeric or logical: returning NA

And you can remove the warning message by using an anonymous function that tests for class numeric before calculating the mean:

sapply(df, function(x)ifelse(is.numeric(x), mean(x), NA))

speed  dist   foo 
15.40 42.98    NA 


标签: r apply