I want to use R to summarize numerical data in a table with non-unique rownames to a result table with unique row-names with values summarized using a custom function. The summarization logic is: use the mean of values if the ratio of the maximum to the minimum value is < 1.5, else use median. Because the table is very large, I am trying to use the melt() and cast() functions in the reshape2 package.
# example table with non-unique row-names tab <- data.frame(gene=rep(letters[1:3], each=3), s1=runif(9), s2=runif(9)) # melt tab.melt <- melt(tab, id=1) # function to summarize with logic: mean if max/min < 1.5, else median summarize <- function(x){ifelse(max(x)/min(x)<1.5, mean(x), median(x))} # cast with summarized values dcast(tab.melt, gene~variable, summarize)
The last line of code above results in an error notice.
Error in vapply(indices, fun, .default) : values must be type 'logical', but FUN(X[[1]]) result is type 'double' In addition: Warning messages: 1: In max(x) : no non-missing arguments to max; returning -Inf 2: In min(x) : no non-missing arguments to min; returning Inf
What am I doing wrong? Note that if the summarize function were to just return min(), or max(), there is no error, though there is the warning message about 'no non-missing arguments.' Thank you for any suggestion.
(The actual table I want to work with is a 200x10000 one.)
dcast() tries to set the value of missing combination by default value.
you can specify this by fill argument, but if fill=NULL, then the value returned by fun(0-lenght vector) (i.e., summarize(numeric(0)) here) is used as default.
please see ?dcast
then, here is a workaround:
Short answer: provide a value for fill as follows acast(tab.melt, gene~variable, summarize, fill=0)
Long answer: It appears your function gets wrapped as follows, before being passed to vapply in the vaggregate function (dcast calls cast which calls vaggregate which calls vapply):
To find out what .default should be, this code is executed
i.e. .value[0] is passed to the function. min(x) or max(x) returns Inf or -Inf on when x is numeric(0). However, max(x)/min(x) returns NaN which has class logical. So when vapply is executed
with the default value being is of class logical (used as template by vapply), the function fails when starting to return doubles.