I´m new to R and I have to deal with a large data set. I googled a lot but I just can´t find the way to do what i need (although it sounds like an easy thing to do).
What I want to do is reshape my data in a wide form. To do it in the way that I want, I need a new variable with numbers of order by dates for every factor (that will start with one for each new factor).
Now, this is a small example of what I have:
ID<-c("A","A","A","B","B","C","D","D","D","D")
Date<-c("01-01-2014", "05-01-2014", "06-01-2014",
"01-01-2014", "12-01-2014", "25-01-2014",
"06-01-2014", "12-01-2014", "25-01-2014",
"26-01-2014")
Value<-c(2.5, 3.4, 2.5, 305.66, 300.00, 55.01,
205.32, 99.99, 210.25, 105.125)
mydata<-data.frame(ID, Date, Value)
mydata
ID Date Value
1 A 01-01-2014 2.500
2 A 05-01-2014 3.400
3 A 06-01-2014 2.500
4 B 01-01-2014 305.660
5 B 12-01-2014 300.000
6 C 25-01-2014 55.010
7 D 06-01-2014 205.320
8 D 12-01-2014 99.990
9 D 25-01-2014 210.250
10 D 26-01-2014 105.125
(Data set is sorted first by ID factor, than by date for each factor.)
And this is what I need: new variable called "Order".
ID Date Value Order
1 A 01-01-2014 2.500 1
2 A 05-01-2014 3.400 2
3 A 06-01-2014 2.500 3
4 B 01-01-2014 305.660 1
5 B 12-01-2014 300.000 2
6 C 25-01-2014 55.010 1
7 D 06-01-2014 205.320 1
8 D 12-01-2014 99.990 2
9 D 25-01-2014 210.250 3
10 D 26-01-2014 105.125 4
The end goal is to reshape data based on the variable "Order" like this:
library(reshape)
goal<-reshape(mydata2,
idvar="ID",
timevar="Order",
direction="wide")
goal
ID Date.1 Value.1 Date.2 Value.2 Date.3 Value.3 Date.4 Value.4
1 A 01-01-2014 2.50 05-01-2014 3.40 06-01-2014 2.50 <NA> NA
4 B 01-01-2014 305.66 12-01-2014 300.00 <NA> NA <NA> NA
6 C 25-01-2014 55.01 <NA> NA <NA> NA <NA> NA
7 D 06-01-2014 205.32 12-01-2014 99.99 25-01-2014 210.25 26-01-2014 105.125
Or is there another way to reshape data like this without the "Order" Variable?
This is precisely what the
getanID
function in my "splitstackshape" package is for:Alternatively, you can explore the development version of "data.table" which reimplements
dcast
in a very flexible way that will allow you to do this transformation without needing to generate a "time" variable.