This question already has an answer here:
I have a quite large dataframe structured like this:
id x1 x2 x3 y1 y2 y3 z1 z2 z3 v
1 2 4 5 10 20 15 200 150 170 2.5
2 3 7 6 25 35 40 300 350 400 4.2
I need to create a dataframe like this:
id xsource xvalue yvalue zvalue v
1 x1 2 10 200 2.5
1 x2 4 20 150 2.5
1 x3 5 15 170 2.5
2 x1 3 25 300 4.2
2 x2 7 35 350 4.2
2 x3 6 40 400 4.2
I'm quite sure I have to do it with the reshape package, but I'm not able to get what I want.
Could you help me?
Thanks
Somebody please prove me wrong, but I don't think it's easy to solve this problem using either the
reshape
package or the basereshape
function.However, it's easy enough using
lapply
anddo.call
:Replicate the data:
Do the analysis
Here's the
reshape()
solution.The key bit is that the
varying=
argument can take a list of vectors of column names in the wide format that correspond to single variables in the long format. In this case, columns"x1", "x2", "x3"
in the original data frame are sent to one column in the long data frame, columns"y1, y2, y3"
will go into a second column, and so on.Finally, a couple of purely cosmetic steps are needed to get the results looking exactly as shown in your question:
Try using the reshapeGUI package. It utilizes the plyr package and the reshape2 package and it provides you with an easy to use interface that allows you to preview your reshape before you execute it. It also gives you the code for the reshape that you're doing so you can paste it into your script for reproducability and so you can learn to use the melt and cast commands in reshape2. It's a nice crutch for complex data manipulations like this one for those who aren't reshape ninjas.
Here's one approach that use
reshape2
and is described in depth in my paper on tidy data.Step 1: identify the variables that are already in columns. In this case: id, and v. These are the variables we melt by
Step 2: split up variables that are currently combined in one column. In this case that's source (the character part) and rep (the integer part):
There are lots of ways to do this, I'm going to use string extraction with the
stringr
packageStep 3: rearrange the variables that currently in the rows but we want in columns:
Here are two more recent approaches that might be of interest to someone reading this question:
Option 1: The tidyverse
Option 2: data.table