I have this dataframe:
set.seed(50)
data <- data.frame(age=c(rep("juv", 10), rep("ad", 10)),
sex=c(rep("m", 10), rep("f", 10)),
size=c(rep("large", 10), rep("small", 10)),
length=rnorm(20),
width=rnorm(20),
height=rnorm(20))
data$length[sample(1:20, size=8, replace=F)] <- NA
data$width[sample(1:20, size=8, replace=F)] <- NA
data$height[sample(1:20, size=8, replace=F)] <- NA
age sex size length width height
1 juv m large NA -0.34992735 0.10955641
2 juv m large -0.84160374 NA -0.41341885
3 juv m large 0.03299794 -1.58987765 NA
4 juv m large NA NA NA
5 juv m large -1.72760411 NA 0.09534935
6 juv m large -0.27786453 2.66763339 0.49988990
7 juv m large NA NA NA
8 juv m large -0.59091244 -0.36212039 -1.65840096
9 juv m large NA 0.56874633 NA
10 juv m large NA 0.02867454 -0.49068623
11 ad f small 0.29520677 0.19902339 NA
12 ad f small 0.55475223 -0.85142228 0.33763747
13 ad f small NA NA -1.96590570
14 ad f small 0.19573384 0.59724896 -2.32077461
15 ad f small -0.45554055 -1.09604786 NA
16 ad f small -0.36285547 0.01909655 1.16695158
17 ad f small -0.15681338 NA NA
18 ad f small NA NA NA
19 ad f small NA 0.40618657 -1.33263085
20 ad f small -0.32342568 NA -0.13883976
I'm trying to make a function that counts the number of NA values of each of length
, width
and height
at each level of the three factors in the dataframe. I've tried this:
exploreMissingValues <- function(dataframe, factors, variables){
library(plyr)
Variables <- list(variables)
llply(Variables, function(x) ddply(dataframe, .(factors),
summarise,
number.of.NA=length(x[is.na(x)])))
}
exploreMissingValues(data,
c("age", "sex", "size"),
c("length", "width", "height"))
...but this gives an error. How can I get this function to return number of NA values at each level of the dataframe?
Looking for something like this...???
Use
aggregate
:You could also
apply
this to your dataframe, by each factor to getNA
counts for all of the dimension measures for each factor.To do this all as one function, nest
nacheck
in something and thenlapply
:A
data.table
approach:and the
plyr
equivalent usingcolwise
andddply
You could always use a vector of column names for the
by
components