I have a file that I would like to reshape it to use R: These are the commands that I am running.
x <- data.frame(read.table("total.txt", sep=",", header=T)
y <- melt(x, id=c("Hostname", "Date", "MetricType"))
when I issue this command to basically combine date with hour, I get an error and the window hangs.
yy <- cast(y, Hostname + Date + variable ~ MetricType)
This is the error:
Aggregation requires fun.aggregate: length used as default
ServerNa Date MetricType Hour Value
19502 server1 01/05/2012 MemoryAVG Hour5 41.830000
19503 server1 01/05/2012 CPUMaximum Hour5 9.000000
19504 server1 01/05/2012 CPUAVG+Sev Hour5 9.060000
19505 server1 01/05/2012 CPUAVG Hour5 30.460000
19506 server1 01/05/2012 61 Hour5 63.400000
19507 server1 01/05/2012 60 Hour5 59.300000
19508 server2 01/05/2012 MemoryAVG Hour5 10.690000
19509 server2 01/05/2012 CPUMaximum Hour5 1.000000
19510 server2 01/05/2012 CPUAVG+Sev Hour5 0.080000
19511 server2 01/05/2012 CPUAVG Hour5 1.350000
Is there an easy way to do this without hanging the server?
when I used library(reshape2) and this:
yy <- acast(y, Hostname + Date + variable ~ MetricType, fun.aggregate=mean)
all the values turn into NA. I have no clue what is going on?
Clarification: In the discussion below, I refer to dcast()
rather than cast()
. As Maiasaura notes in the comments, the function cast()
from the reshape
package has been replaced in the reshape2
package by two functions: dcast()
(for data.frame output) and acast()
(for array or matrix output). In any case, my comments about the need for a fun.aggregate
argument hold equally for cast()
, dcast()
, and acast()
.
The error is being thrown because for at least one combination of the categorical variables in the call to cast()
, your data.frame y
must contain at least two rows of data. As documented in ?cast
(or ?dcast
):
If the combination of variables you supply does not uniquely
identify one row in the original data set, you will need to supply
an aggregating function, ‘fun.aggregate’.
Run the code below to see how this works, and how it can be remedied. In the last line of code, I use the fun.aggregate
argument to tell dcast()
to use mean()
to combine values for any repeated combination of variables. In its place, you can put whatever aggregation function best fits your own situation.
library(reshape2)
## A toy dataset, with one row for each combination of variables
d <- expand.grid(Hostname = letters[1:2],
Date = Sys.Date() + 0:1,
MetricType = LETTERS[3:4])
d$Value <- rnorm(seq_len(nrow(d)))
## A second dataset, in which one combination of variables is repeated
d2 <- rbind(d, d[1,])
## Runs without complaint
dcast(d, Hostname + Date ~ MetricType)
## Throws error asking for an aggregation function
dcast(d2, Hostname + Date ~ MetricType)
## Happy again, with a supplied aggregation function
dcast(d2, Hostname + Date ~ MetricType, fun.aggregate=mean)