This question already has an answer here:
I'm running into difficulties reshaping a large dataframe. And I've been relatively fortunate in avoiding reshaping problems in the past, which also means I'm terrible at it.
My current dataframe looks something like this:
unique_id seq response detailed.name treatment
a N1 123.23 descr. of N1 T1
a N2 231.12 descr. of N2 T1
a N3 231.23 descr. of N3 T1
...
b N1 343.23 descr. of N1 T2
b N2 281.13 descr. of N2 T2
b N3 901.23 descr. of N3 T2
...
And I'd like:
seq detailed.name T1 T2
N1 descr. of N1 123.23 343.23
N2 descr. of N2 231.12 281.13
N3 descr. of N3 231.23 901.23
I've looked into the reshape package, but I'm not sure how I can convert the treatment factors into individual column names.
Thanks!
Edit: I tried running this on my local machine (4GB dual-core iMac 3.06Ghz) and it keeps failing with:
> d.tmp.2 <- cast(d.tmp, `SEQ_ID` + `GENE_INFO` ~ treatments)
Aggregation requires fun.aggregate: length used as default
R(5751) malloc: *** mmap(size=647168) failed (error code=12)
*** error: can't allocate region
*** set a breakpoint in malloc_error_break to debug
I'll try running this on one of our bigger machines when I get a chance.
Another option would be to use
spread
fromtidyr
The opposite action is performed by
gather
Also, there is
dcast.data.table
fromdata.table
data
You can also use the
reshape
function in thestats
package. I don't have your sample dataset, but it will look something like this:If you want to get the same results using
reshape2
, which is a faster and more memory efficient rewrite of thereshape
package, then the following will work.The main change is the use of the
dcast
function when you want tocast
with adata.frame
as output. This replaces thecast
function ofreshape
reshape always seems tricky to me too, but it always seems to work with a little trial and error. Here's what I ended up finding:
Your original data was already in long format, but not in the long format that melt/cast uses. So I re-melted it. The second argument (id.vars) is list of things not to melt. The third argument (measure.vars) is the list of things that vary.
Then, the cast uses a formula. Left of the tilde are the things that stay as they are, and right of the tilde are the columns that are used to condition the value column.
More or less...!
Building on Harlan's answer - the remelting step can be avoided if the data is already in the long format, and the column holding values is specified in the
cast
call.