Edit -- This question was originally titled << Long to wide data reshaping in R >>
I'm just learning R and trying to find ways to apply it to help out others in my life. As a test case, I'm working on reshaping some data, and I'm having trouble following the examples I've found online. What I'm starting with looks like this:
ID Obs 1 Obs 2 Obs 3
1 43 48 37
1 27 29 22
1 36 32 40
2 33 38 36
2 29 32 27
2 32 31 35
2 25 28 24
3 45 47 42
3 38 40 36
And what I want to end up with will look like this:
ID Obs 1 mean Obs 1 std dev Obs 2 mean Obs 2 std dev
1 x x x x
2 x x x x
3 x x x x
And so forth. What I'm unsure of is whether I need additional information in my long-form data, or what. I imagine that the math part (finding the mean and standard deviations) will be the easy part, but I haven't been able to find a way that seems to work to reshape the data correctly to start in on that process.
Thanks very much for any help.
Here is probably the simplest way to go about it (with a reproducible example):
EDIT: The following approach saves you a lot of typing when dealing with many columns.
Here's another take on the
data.table
answers, using @Carson's data, that's a bit more readable (and also a little faster, because of usinglapply
instead ofsapply
):This is an aggregation problem, not a reshaping problem as the question originally suggested -- we wish to aggregate each column into a mean and standard deviation by ID. There are many packages that handle such problems. In the base of R it can be done using
aggregate
like this (assumingDF
is the input data frame):Note 1: A commenter pointed out that
ag
is a data frame for which some columns are matrices. Although initially that may seem strange, in fact it simplifies access.ag
has the same number of columns as the inputDF
. Its first columnag[[1]]
isID
and the ith column of the remainderag[[i+1]]
(or equivalanetlyag[-1][[i]]
) is the matrix of statistics for the ith input observation column. If one wishes to access the jth statistic of the ith observation it is thereforeag[[i+1]][, j]
which can also be written asag[-1][[i]][, j]
.On the other hand, suppose there are
k
statistic columns for each observation in the input (where k=2 in the question). Then if we flatten the output then to access the jth statistic of the ith observation column we must use the more complexag[[k*(i-1)+j+1]]
or equivalentlyag[-1][[k*(i-1)+j]]
.For example, compare the simplicity of the first expression vs. the second:
Note 2: The input in reproducible form is:
There are a few different ways to go about it.
reshape2
is a helpful package. Personally, I like usingdata.table
Below is a step-by-step
If
myDF
is yourdata.frame
:Also, this may or may not be helpful
I add the
dplyr
solution.