I have a similar question to the question found here:
R, subtract value from previous row, group by (slight modification; see below):
In R, lets say I have this data.
Data
id date value
2380 10/30/12 21.01
2380 10/31/12 22.04
2380 11/1/12 22.65
2380 11/2/12 23.11
20100 10/30/12 35.21
20100 10/31/12 37.07
20100 11/1/12 38.17
20100 11/2/12 38.97
20103 10/30/12 57.98
20103 10/31/12 60.83
And I want to subtract the value from the value of the ID of the same ID. I hope this makes sense. See below :)
id date value diff
2380 10/30/12 21.01 0
2380 10/31/12 22.04 1.03
2380 11/1/12 22.65 1.64
2380 11/2/12 23.11 2.10
20100 10/30/12 35.21 0
20100 10/31/12 37.07 1.86
20100 11/1/12 38.17 2.96
20100 11/2/12 38.97 3.76
20103 10/30/12 57.98 0
20103 10/31/12 60.83 2.85
Thank you for your help!
Let's assume that dates are already sorted. I would probably retrieve the first value for each id and then use this to compute the diff feature.
Something like this.
my.df <- data.frame(
id = c(2380, 2380, 2380, 2380, 20100,20100,20100, 20100, 20103, 20103),
date = c("10/30/12", "10/31/12", "11/1/12", "11/2/12", "10/30/12", "10/31/12", "11/1/12", "11/2/12", "10/30/12", "10/31/12"),
value = c(21.01, 22.04, 22.65, 23.11, 35.21, 37.07, 38.17, 38.97, 57.98, 60.83),
stringsAsFactors = F)
#
# get ids
my.ids <- unique(my.df$id) # or levels(my.df$id)
# get first val (assuming sorting by date)
id.val0 <- sapply(my.ids, (function(id){
my.df$value[my.df$id == id][1]
}))
names(id.val0) <- my.ids
# do operation
my.df$diff <- sapply(1:nrow(my.df), (function(i){
tmp.id <- my.df$id[i]
my.df$value[i] - id.val0[as.character(tmp.id)]
}))
We can use functions from dplyr
. dt
is the original data. dt2
is the final output.
library(dplyr)
dt2 <- dt %>%
group_by(id) %>%
mutate(diff = value - first(value))
dt2
Source: local data frame [10 x 4]
Groups: id [3]
id date value diff
<int> <chr> <dbl> <dbl>
1 2380 10/30/2012 21.01 0.00
2 2380 10/31/2012 22.04 1.03
3 2380 11/1/2012 22.65 1.64
4 2380 11/2/2012 23.11 2.10
5 20100 10/30/2012 35.21 0.00
6 20100 10/31/2012 37.07 1.86
7 20100 11/1/2012 38.17 2.96
8 20100 11/2/2012 38.97 3.76
9 20103 10/30/2012 57.98 0.00
10 20103 10/31/2012 60.83 2.85
Using base R
, you could do something like this.
res <- lapply(split(Data, Data$id), function(x) {
x$diff <- x$value - x$value[1]
x})
res <- do.call(rbind, res)
row.names(res) <- NULL
res
id date value diff
1 2380 10/30/12 21.01 0.00
2 2380 10/31/12 22.04 1.03
3 2380 11/1/12 22.65 1.64
4 2380 11/2/12 23.11 2.10
5 20100 10/30/12 35.21 0.00
6 20100 10/31/12 37.07 1.86
7 20100 11/1/12 38.17 2.96
8 20100 11/2/12 38.97 3.76
9 20103 10/30/12 57.98 0.00
10 20103 10/31/12 60.83 2.85