I have a dataframe with panel structure: 2 observations for each unit from two years:
library(tidyr)
mydf <- data.frame(
id = rep(1:3, rep(2,3)),
year = rep(c(2012, 2013), 3),
value = runif(6)
)
mydf
# id year value
#1 1 2012 0.09668064
#2 1 2013 0.62739399
#3 2 2012 0.45618433
#4 2 2013 0.60347152
#5 3 2012 0.84537624
#6 3 2013 0.33466030
I would like to reshape this data to wide format which can be done easily with tidyr::spread
. However, as the values of the year
variable are numbers, the names of my new variables become numbers as well which makes its further use harder.
spread(mydf, year, value)
# id 2012 2013
#1 1 0.09668064 0.6273940
#2 2 0.45618433 0.6034715
#3 3 0.84537624 0.3346603
I know I can easily rename the columns. However, if I would like to reshape within a chain with other operations, it becomes inconvenient. E.g. the following line obviously does not make sense.
library(dplyr)
mydf %>% spread(year, value) %>% filter(2012 > 0.5)
The following works but is not that concise:
tmp <- spread(mydf, year, value)
names(tmp) <- c("id", "y2012", "y2013")
filter(tmp, y2012 > 0.5)
Any idea how I can change the new variable names within spread
?
I know some years has passed since this question was originally asked, but for posterity I want to also highlight the
sep
argument ofspread
. When notNULL
, it will be used as separator between the key name and values:This is not exactly as wanted in the question, but sufficient for my purposes. See
?spread
.Update with tidyr 1.0.0: tidyr 1.0.0 have now introduced
pivot_wider
(andpivot_longer
) which allows for more control in this respect with the argumentsnames_sep
andnames_prefix
. So now the call would be:To get exactly what was originally wanted (prefixing "y" only) you can of course now get that directly by simply having
names_prefix = "y"
.The
names_sep
is used in case you gather over multiple columns as demonstrated below where I have added quarters to the data:Another option is to use the
setNames()
function as the next thing in the pipe:The only problem using setNames is that you have to know exactly what your columns will be when you
spread()
them. Most of the time, that's not a problem, particularly if you're working semi-interactively.But if you're missing a key/value pair in your original data, there's a chance it won't show up as a column, and you can end up naming your columns incorrectly without even knowing it. Granted,
setNames()
will throw an error if the number of names doesn't match the number of columns, so you've got a bit of error checking built in.Still, the convenience of using
setNames()
has outweighed the risk more often than not for me.Using
spread()
's successorpivot_wider()
we can give a prefix to the created columns :Created on 2019-09-14 by the reprex package (v0.3.0)
rename() in dplyr should do the trick
You can use
backticks
for column names starting with numbers andfilter
should work as expectedOr another option would be using
unite
to join two columns to a single columnn after creating a second column 'year1' with string 'y'.Even we can change the 'year' column within
mutate
by usingpaste