Multiple Conditional Cumulative Sum in R

2019-08-27 18:17发布

问题:

This is my data frame as given below

rd <- data.frame(
    Customer = rep("A",15),                 
    date_num = c(3,3,9,11,14,14,15,16,17,20,21,27,28,29,31),                  
    exp_cumsum_col = c(1,1,2,3,4,4,4,4,4,5,5,6,6,6,7))

I am trying to get column 3 (exp_cumsum_col), but am unable to get the correct values after trying many times. This is the code I used:

rd<-as.data.frame(rd %>%
    group_by(customer) %>%                
    mutate(exp_cumsum_col = cumsum(row_number(ifelse(date_num[i]==date_num[i+1],1)))))

If my date_num is continuous, then I am treating that entire series as a one number, and if there is any break in my date_num, then I am increasing exp_cumsum_col by 1 ..... exp_cumsum_col would start at 1.

回答1:

We can take the differece of adjacent elements, check if it is greater than 1 and get the cumsum

rd %>% 
   group_by(Customer) %>%
   mutate(newexp_col = cumsum(c(TRUE, diff(date_num) > 1)))
#    Customer date_num exp_cumsum_col newexp_col
#1         A        3              1          1
#2         A        3              1          1
#3         A        9              2          2
#4         A       11              3          3
#5         A       14              4          4
#6         A       14              4          4
#7         A       15              4          4
#8         A       16              4          4
#9         A       17              4          4
#10        A       20              5          5
#11        A       21              5          5
#12        A       27              6          6
#13        A       28              6          6
#14        A       29              6          6
#15        A       31              7          7


标签: r dplyr cumsum