So I've got data that looks like this:
id year principal interest
1: 011000600 2013 0.00 0.00
2: 011000600 2014 544.03 0.00
3: 011000700 2013 0.00 0.00
4: 011000700 2014 0.01 0.00
5: 011000800 2013 363.44 12.79
6: 011000800 2014 2005.98 0.00
7: 011000900 2013 0.00 0.00
8: 011000900 2014 0.00 0.00
9: 011001000 2013 0.00 0.00
10: 011001000 2014 0.00 0.00
11: 011001100 2013 0.00 0.00
12: 011001100 2014 1723.24 0.00
13: 011001560 2013 0.00 0.00
14: 011001560 2014 0.00 0.00
15: 011001650 2013 0.00 0.00
16: 011001650 2014 0.00 0.00
(basically a longitudinal sample of a bunch of variables)
The data is on the large side so I'm using data.table
for everything. I reshape it to get each id
unique by row:
datam<-melt(data,id=c("id","year"))
data1<-dcast.data.table(datam,id~...)
This yields:
id 2013_principal 2013_interest 2014_principal 2014_interest
1: 011000600 0.00 0.00 544.03 0
2: 011000700 0.00 0.00 0.01 0
3: 011000800 363.44 12.79 2005.98 0
4: 011000900 0.00 0.00 0.00 0
5: 011001000 0.00 0.00 0.00 0
6: 011001100 0.00 0.00 1723.24 0
This is course the form of data that I want, but having column names start with numbers is a pain in the keester.
Any suggestions for how to deal with this? I'd much rather have:
id principal_2013 interest_2013 principal_2014 interest_2014
1: 011000600 0.00 0.00 544.03 0
2: 011000700 0.00 0.00 0.01 0
3: 011000800 363.44 12.79 2005.98 0
4: 011000900 0.00 0.00 0.00 0
5: 011001000 0.00 0.00 0.00 0
6: 011001100 0.00 0.00 1723.24 0
(switching the year to be a suffix) I've tried being more explicit when casting, e.g.
data2<-dcast.data.table(datam,id~year+...)
data3<-dcast.data.table(datam,id~...+year)
To no avail:
data2
id 2013_principal 2013_interest 2014_principal 2014_interest
1: 011000600 0.00 0.00 544.03 0
2: 011000700 0.00 0.00 0.01 0
3: 011000800 363.44 12.79 2005.98 0
4: 011000900 0.00 0.00 0.00 0
5: 011001000 0.00 0.00 0.00 0
6: 011001100 0.00 0.00 1723.24 0
data3
id 2013_principal 2013_interest 2014_principal 2014_interest
1: 011000600 0.00 0.00 544.03 0
2: 011000700 0.00 0.00 0.01 0
3: 011000800 363.44 12.79 2005.98 0
4: 011000900 0.00 0.00 0.00 0
5: 011001000 0.00 0.00 0.00 0
6: 011001100 0.00 0.00 1723.24 0
Seems pretty silly for the naming convention of dcast to default to this style, given that I imagine this type of reshaping is ubiquitous.
I've also tried patching things up ex-post given some other posts I've found (e.g. here), but it runs unfathomably slow (there are ~400 variables to rename in the full data set)
names(data)<-ifelse(substr(names(data),1,2) %in% c("19","20"),
paste(substr(names(data),6,nchar(data)),
substr(names(data),1,4),sep="_") ,
names(copy))
(I'm trying to find all the variables starting with years--19xx or 20xx--and trying to swap the beginning and end)
FR #5675 is now implemented in v1.9.3. From NEWS
That is, now you can do:
and the column names will have
year
values at the end, as expected.Also added documentation - Doc #5676. From NEWS:
Now
?dcast.data.table
contains the added line:HTH
@Arun's solution is to explicitly set the right hand of the formula to the order that you wish to name the columns.
Even better, with the new developments to
dcast
in data.table from v1.9.5+, we can cast multiple columns simultaneously..No more having to
melt
unnecessarily before tocast
, therefore quite efficient.