I have a data frame like this:
id y1 y2 y3 y4
--+--+--+--+--
a |12|13|14|
b |12|18| |
c |13| | |
d |13|14|15|16
I want to reshape in such a way that I end with two columns. The above example would then become:
id from to
--+----+---
a |12 |13
a |13 |14
a |14 |
b |12 |18
b |18 |
c |13 |
d |13 |14
d |14 |15
d |15 |16
Each id
has a 'from' and a 'to' per pair of year values.
Does anybody know of an easy way to do this? I tried using reshape2
. I also looked at Combine Multiple Columns Into Tidy Data but I think my case is different.
You can use lapply
to loop over the pairs of columns and rbind
to union them:
do.call(rbind,
lapply(2:(length(df)-1),
function(x) setNames(df[!is.na(df[,x]),c(1,x,x+1)],
c("id", "from", "to"))))
id from to
1 a 12 13
2 b 12 18
3 c 13 NA
4 d 13 14
11 a 13 14
21 b 18 NA
41 d 14 15
12 a 14 NA
42 d 15 16
A solution uses dplyr
and tidyr
. dt2
is the final output.
# Create example data frame
dt <- data.frame(id = c("a", "b", "c", "d"),
y1 = c(12, 12, 13, 13),
y2 = c(13, 18, NA, 14),
y3 = c(14, NA, NA, 15),
y4 = c(NA, NA, NA, 16),
stringsAsFactors = FALSE)
# Load packages
library(dplyr)
library(tidyr)
# Process the data
dt2 <- dt %>%
gather(STEP, from, -id) %>%
drop_na(from) %>%
arrange(id, STEP) %>%
group_by(id) %>%
mutate(to = lead(from)) %>%
select(-STEP)
In base R, stack
and shift everything back one row in each group. Using @ycw's example data, dt
:
tmp <- na.omit(cbind(dt[1], stack(dt[-1])[-2]))
names(tmp)[2] <- "from"
tmp$to <- with(tmp, ave(from, id, FUN=function(x) c(tail(x,-1),NA) ))
tmp[order(tmp$id),]
# id from to
#1 a 12 13
#5 a 13 14
#9 a 14 NA
#2 b 12 18
#6 b 18 NA
#3 c 13 NA
#4 d 13 14
#8 d 14 15
#12 d 15 16
#16 d 16 NA
In the world of data.table
, the same logic applies. melt
, then shift
by=
id:
library(data.table)
dt <- as.data.table(dt)
melt(dt, id.vars="id", value.name="from")[
!is.na(from),-"variable"][, to := shift(from,1,type="lead"), by=id
][order(id)]