R subtract value for the same ID (from the first I

2019-03-02 05:39发布

问题:

I have a similar question to the question found here: R, subtract value from previous row, group by (slight modification; see below):

In R, lets say I have this data.

Data
id      date        value
2380    10/30/12    21.01
2380    10/31/12    22.04
2380    11/1/12     22.65
2380    11/2/12     23.11
20100   10/30/12    35.21
20100   10/31/12    37.07
20100   11/1/12     38.17
20100   11/2/12     38.97
20103   10/30/12    57.98
20103   10/31/12    60.83 

And I want to subtract the value from the value of the ID of the same ID. I hope this makes sense. See below :)

id      date        value   diff
2380    10/30/12    21.01   0
2380    10/31/12    22.04   1.03
2380    11/1/12     22.65   1.64
2380    11/2/12     23.11   2.10
20100   10/30/12    35.21   0
20100   10/31/12    37.07   1.86
20100   11/1/12     38.17   2.96
20100   11/2/12     38.97   3.76
20103   10/30/12    57.98   0
20103   10/31/12    60.83   2.85

Thank you for your help!

回答1:

Let's assume that dates are already sorted. I would probably retrieve the first value for each id and then use this to compute the diff feature. Something like this.

my.df <- data.frame(
  id = c(2380, 2380, 2380, 2380, 20100,20100,20100, 20100, 20103, 20103),
  date = c("10/30/12", "10/31/12", "11/1/12", "11/2/12", "10/30/12", "10/31/12", "11/1/12", "11/2/12", "10/30/12", "10/31/12"),
  value = c(21.01, 22.04, 22.65, 23.11, 35.21, 37.07, 38.17, 38.97, 57.98, 60.83),
  stringsAsFactors = F)
#
# get ids
my.ids <- unique(my.df$id) # or levels(my.df$id)

# get first val (assuming sorting by date)
id.val0 <- sapply(my.ids, (function(id){
  my.df$value[my.df$id == id][1]
}))
names(id.val0) <- my.ids

# do operation
my.df$diff <- sapply(1:nrow(my.df), (function(i){
  tmp.id <- my.df$id[i]
  my.df$value[i] - id.val0[as.character(tmp.id)]
}))


回答2:

We can use functions from dplyr. dt is the original data. dt2 is the final output.

library(dplyr)

dt2 <- dt %>%
  group_by(id) %>%
  mutate(diff = value - first(value))

dt2
Source: local data frame [10 x 4]
Groups: id [3]

      id       date value  diff
   <int>      <chr> <dbl> <dbl>
1   2380 10/30/2012 21.01  0.00
2   2380 10/31/2012 22.04  1.03
3   2380  11/1/2012 22.65  1.64
4   2380  11/2/2012 23.11  2.10
5  20100 10/30/2012 35.21  0.00
6  20100 10/31/2012 37.07  1.86
7  20100  11/1/2012 38.17  2.96
8  20100  11/2/2012 38.97  3.76
9  20103 10/30/2012 57.98  0.00
10 20103 10/31/2012 60.83  2.85


回答3:

Using base R, you could do something like this.

res <- lapply(split(Data, Data$id), function(x) {
            x$diff <- x$value - x$value[1]
            x})
res <- do.call(rbind, res)
row.names(res) <- NULL
res
      id     date value diff
1   2380 10/30/12 21.01 0.00
2   2380 10/31/12 22.04 1.03
3   2380  11/1/12 22.65 1.64
4   2380  11/2/12 23.11 2.10
5  20100 10/30/12 35.21 0.00
6  20100 10/31/12 37.07 1.86
7  20100  11/1/12 38.17 2.96
8  20100  11/2/12 38.97 3.76
9  20103 10/30/12 57.98 0.00
10 20103 10/31/12 60.83 2.85