Error with custom aggregate function for a cast()

2020-03-01 19:01发布

I want to use R to summarize numerical data in a table with non-unique rownames to a result table with unique row-names with values summarized using a custom function. The summarization logic is: use the mean of values if the ratio of the maximum to the minimum value is < 1.5, else use median. Because the table is very large, I am trying to use the melt() and cast() functions in the reshape2 package.

# example table with non-unique row-names
tab <- data.frame(gene=rep(letters[1:3], each=3), s1=runif(9), s2=runif(9))
# melt
tab.melt <- melt(tab, id=1)
# function to summarize with logic: mean if max/min < 1.5, else median
summarize <- function(x){ifelse(max(x)/min(x)<1.5, mean(x), median(x))}
# cast with summarized values
dcast(tab.melt, gene~variable, summarize)

The last line of code above results in an error notice.

Error in vapply(indices, fun, .default) : 
  values must be type 'logical',
 but FUN(X[[1]]) result is type 'double'
In addition: Warning messages:
1: In max(x) : no non-missing arguments to max; returning -Inf
2: In min(x) : no non-missing arguments to min; returning Inf

What am I doing wrong? Note that if the summarize function were to just return min(), or max(), there is no error, though there is the warning message about 'no non-missing arguments.' Thank you for any suggestion.

(The actual table I want to work with is a 200x10000 one.)

2条回答
虎瘦雄心在
2楼-- · 2020-03-01 19:16

dcast() tries to set the value of missing combination by default value.

you can specify this by fill argument, but if fill=NULL, then the value returned by fun(0-lenght vector) (i.e., summarize(numeric(0)) here) is used as default.

please see ?dcast

then, here is a workaround:

 dcast(tab.melt, gene~variable, summarize, fill=NaN)
查看更多
Summer. ? 凉城
3楼-- · 2020-03-01 19:31

Short answer: provide a value for fill as follows acast(tab.melt, gene~variable, summarize, fill=0)

Long answer: It appears your function gets wrapped as follows, before being passed to vapply in the vaggregate function (dcast calls cast which calls vaggregate which calls vapply):

fun <- function(i) {
    if (length(i) == 0) 
        return(.default)
    .fun(.value[i], ...)
}

To find out what .default should be, this code is executed

if (is.null(.default)) {
    .default <- .fun(.value[0])
}

i.e. .value[0] is passed to the function. min(x) or max(x) returns Inf or -Inf on when x is numeric(0). However, max(x)/min(x) returns NaN which has class logical. So when vapply is executed

vapply(indices, fun, .default)

with the default value being is of class logical (used as template by vapply), the function fails when starting to return doubles.

查看更多
登录 后发表回答