How to create id variable by grouping sequenced nu

2020-05-06 14:35发布

问题:

I want to add ID variable in this data. If receipt_ids are sequenced numbers, then those have same IDs.

CUST_NO_ID  receipt_id      dollar
  12         29             20.84
  12         30             20.21
  12         86             24.50
  12         87             20.68
  12        108             25.79
  12        109             24.93
  12        125             20.63
  12        126              9.90
  19        193             69.48
  19        194             46.88

here is my desired result

CUST_NO_ID  receipt_id      dollar       ID
  12         29             20.84        1
  12         30             20.21        1
  12         86             24.50        2
  12         87             20.68        2
  12        108             25.79        3
  12        109             24.93        3
  12        110             24.93        3
  12        125             20.63        4
  12        126              9.90        4
  19        193             69.48        5
  19        194             46.88        6

回答1:

Had a similar notion to @Psidom, but he beat me to the punch with cumsum. Here's a dplyr solution. Adding in group_by can give you added flexibility if you want to restart ids by customer number.

df %>% 
  mutate(id = cumsum(c(TRUE, diff(receipt_id) != 1)))


回答2:

Assuming your data frame has already been sorted by CUST_NO_ID and receipt_id, you can use cumsum on a conditional vector where TRUE indicates the position a new ID should be created:

df$ID = cumsum(c(T, diff(df$receipt_id) != 1 | diff(df$CUST_NO_ID) != 0)))

df
#   CUST_NO_ID receipt_id dollar ID
#1          12         29  20.84  1
#2          12         30  20.21  1
#3          12         86  24.50  2
#4          12         87  20.68  2
#5          12        108  25.79  3
#6          12        109  24.93  3
#7          12        125  20.63  4
#8          12        126   9.90  4
#9          19        193  69.48  5
#10         19        194  46.88  5


回答3:

This does it

id <- 1

for(row in 1:nrow(data)){
  if(row == 1){
    dif <- 1
  }else{
    dif <- data[row,'receipt_id'] - data[row-1,'receipt_id']
  }

  if(dif != 1){
    id = id + 1
  }

  data[row,'ID'] = id
}


回答4:

We can use data.table

library(data.table)
setDT(df)[, id := cumsum(c(TRUE, diff(receipt_id)!=1))]

Or use the shift

setDT(df)[, id := cumsum((receipt_id - shift(receipt_id, fill=receipt_id[1]))!=1)]


标签: r database dplyr