Aggregation requires fun.aggregate: length used as

2019-03-06 02:37发布

I have a file that I would like to reshape it to use R: These are the commands that I am running.

x <- data.frame(read.table("total.txt", sep=",", header=T)
y <- melt(x, id=c("Hostname", "Date", "MetricType"))

when I issue this command to basically combine date with hour, I get an error and the window hangs.

yy <- cast(y, Hostname + Date + variable ~ MetricType)

This is the error:

Aggregation requires fun.aggregate: length used as default
       ServerNa Date       MetricType   Hour   Value
19502  server1 01/05/2012  MemoryAVG    Hour5  41.830000
19503  server1 01/05/2012 CPUMaximum    Hour5   9.000000
19504  server1 01/05/2012 CPUAVG+Sev    Hour5   9.060000
19505  server1 01/05/2012     CPUAVG    Hour5  30.460000
19506  server1 01/05/2012         61    Hour5  63.400000
19507  server1 01/05/2012         60    Hour5  59.300000
19508  server2 01/05/2012  MemoryAVG    Hour5  10.690000
19509  server2 01/05/2012 CPUMaximum    Hour5   1.000000
19510  server2 01/05/2012 CPUAVG+Sev    Hour5   0.080000
19511  server2 01/05/2012     CPUAVG    Hour5   1.350000

Is there an easy way to do this without hanging the server?

when I used library(reshape2) and this:

yy <- acast(y, Hostname + Date + variable ~ MetricType, fun.aggregate=mean)

all the values turn into NA. I have no clue what is going on?

标签: r reshape
1条回答
一夜七次
2楼-- · 2019-03-06 03:19

Clarification: In the discussion below, I refer to dcast() rather than cast(). As Maiasaura notes in the comments, the function cast() from the reshape package has been replaced in the reshape2 package by two functions: dcast() (for data.frame output) and acast() (for array or matrix output). In any case, my comments about the need for a fun.aggregate argument hold equally for cast(), dcast(), and acast().


The error is being thrown because for at least one combination of the categorical variables in the call to cast(), your data.frame y must contain at least two rows of data. As documented in ?cast (or ?dcast):

If the combination of variables you supply does not uniquely identify one row in the original data set, you will need to supply an aggregating function, ‘fun.aggregate’.

Run the code below to see how this works, and how it can be remedied. In the last line of code, I use the fun.aggregate argument to tell dcast() to use mean() to combine values for any repeated combination of variables. In its place, you can put whatever aggregation function best fits your own situation.

library(reshape2)

## A toy dataset, with one row for each combination of variables
d <- expand.grid(Hostname = letters[1:2],
                 Date = Sys.Date() + 0:1,
                 MetricType = LETTERS[3:4])
d$Value <- rnorm(seq_len(nrow(d)))

## A second dataset, in which one combination of variables is repeated
d2 <- rbind(d, d[1,])

## Runs without complaint
dcast(d, Hostname + Date ~ MetricType)

## Throws error asking for an aggregation function
dcast(d2, Hostname + Date ~ MetricType)

## Happy again, with a supplied aggregation function
dcast(d2, Hostname + Date ~ MetricType, fun.aggregate=mean)
查看更多
登录 后发表回答