From what I read, *cast operations in reshape2 lost their result_variable
feature. Hadley hints at using plyr for this purpose (appending multiple result columns to the input data frame). How would I realize the documentation example ...
aqm <- melt(airquality, id=c("month", "day"), na.rm=TRUE)
cast(aqm, month ~ variable + result_variable, range)
using reshape2
(dcast
) and plyr
(ddply
)?
Here is a
dplyr
solution making use of the amazing%>%
function. It also uses the basereshape
function, which is often underused (IMHO). The code is self explanatory.I think that the other answers should have you covered in terms of how to use "plyr" or "dplyr" (and I would encourage you to continue looking in that direction).
For fun, here's a wrapper around
dcast
to let you specify multiple functions. It doesn't work with functions that return multiple values (likerange
) and it requires you to use a named list of functions.It looks like a bit of a mess, but the result is that you get to stick with the same syntax you're familiar with, while getting to use multiple aggregation functions. The "names" for the
funs
argument are used as the suffixes in the resulting names. Anonymous functions can be specified as expected, for examplemaxSq = function(x) max(x)^2
.This question has multiple answers, due to the flexibility of the 'reshape2' and 'plyr' packages. I will show one of the easiest examples to understand here:
Step 1: Let's break it down into steps. First, let's leave the definition of 'aqm' alone and work from the melted data. This will make the example easier to understand.
Step 2: Now, we want to replace the 'value' column with 'min' and 'max' columns. We can accomplish this with the 'ddply' function from the 'plyr' package. To do this, we use the 'ddply' function (data frame as input, data frame as output, hence "dd"-ply). We first specify the data.
And then we specify the variables we want to use to group our data, 'Month' and 'variable'. We use the
.
function to refer to this variables directly, instead of referring to the values they contain.Now we need to choose an aggregating function. We choose the
summarize
function here, because we have columns ('Day' and 'value') that we don't want to include in our final data. Thesummarize
function will strip away all of the original, non-grouping columns.Finally, we specify the calculation to do for each group. We can refer to the columns of the original data frame ('aqm'), even though they will not be contained in our final data frame. This is how it looks:
Step 3: We can see that the data is vastly reduced, since the
ddply
function has aggregated the lines. Now we need to melt the data again, so we can get our second variable for the final data frame. Note that we need to specify a newvariable.name
argument, so we don't have two columns named "variable".Step 4: And we can finally wrap it all up by casting our data into the final form.
Hopefully, this example will give you enough understanding to get you started. Be aware that a new, data frame-optimized version of the 'plyr' package is being actively developed under the name 'dplyr', so you may want to be ready to convert your code to the new package after it becomes more fully fledged.
With the recent commit to the development version of
data.table v1.9.5
, we can cast multiplevalue.var
columns simultaneously (and also use multiple aggregation functions infun.aggregate
). Please see?dcast
for more and also the examples section.Here's how we could use
dcast
:You can safely ignore the warnings.