Must one melt
a data frame prior to having it cast? From ?melt
:
data molten data frame, see melt.
In other words, is it absolutely necessary to have a data frame molten prior to any acast
or dcast
operation?
Consider the following:
library("reshape2")
library("MASS")
xb <- dcast(Cars93, Manufacturer ~ Type, mean, value.var="Price")
m.Cars93 <- melt(Cars93, id.vars=c("Manufacturer", "Type"), measure.vars="Price")
xc <- dcast(m.Cars93, Manufacturer ~ Type, mean, value.var="value")
Then:
> identical(xb, xc)
[1] TRUE
So in this case the melt
operation seems to have been redundant.
What are the general guiding rules in these cases? How do you decide when a data frame needs to be molten prior to a *cast
operation?
Whether or not you need to
melt
your dataset depends on what form you want the final data to be in and how that relates to what you currently have.The way I generally think of it is:
dcast
will create new columns based on the combination of the values.To illustrate with a small example, consider this tiny dataset:
Imagine that our possible value variables are columns "D" or "E", but we are only interested in the values from "E". Imagine also that our primary "id" is column "A", and we want to spread the values out according to column "B". Column "C" is irrelevant at this point.
With that scenario, we would not need to
melt
the data first. We could simply do:Compare what happens when you do the following, keeping in mind my three points above:
When is
melt
required?Now, let's make one small adjustment to the scenario: We want to spread out the values from both columns "D" and "E" with no actual aggregation taking place. With this change, we need to
melt
the data first so that the relevant values that need to be spread out are in a single column (point 3 above).