How to create id variable by grouping sequenced nu

2020-05-06 14:05发布

I want to add ID variable in this data. If receipt_ids are sequenced numbers, then those have same IDs.

CUST_NO_ID  receipt_id      dollar
  12         29             20.84
  12         30             20.21
  12         86             24.50
  12         87             20.68
  12        108             25.79
  12        109             24.93
  12        125             20.63
  12        126              9.90
  19        193             69.48
  19        194             46.88

here is my desired result

CUST_NO_ID  receipt_id      dollar       ID
  12         29             20.84        1
  12         30             20.21        1
  12         86             24.50        2
  12         87             20.68        2
  12        108             25.79        3
  12        109             24.93        3
  12        110             24.93        3
  12        125             20.63        4
  12        126              9.90        4
  19        193             69.48        5
  19        194             46.88        6

标签: r database dplyr
4条回答
2楼-- · 2020-05-06 14:42

Assuming your data frame has already been sorted by CUST_NO_ID and receipt_id, you can use cumsum on a conditional vector where TRUE indicates the position a new ID should be created:

df$ID = cumsum(c(T, diff(df$receipt_id) != 1 | diff(df$CUST_NO_ID) != 0)))

df
#   CUST_NO_ID receipt_id dollar ID
#1          12         29  20.84  1
#2          12         30  20.21  1
#3          12         86  24.50  2
#4          12         87  20.68  2
#5          12        108  25.79  3
#6          12        109  24.93  3
#7          12        125  20.63  4
#8          12        126   9.90  4
#9          19        193  69.48  5
#10         19        194  46.88  5
查看更多
【Aperson】
3楼-- · 2020-05-06 14:51

This does it

id <- 1

for(row in 1:nrow(data)){
  if(row == 1){
    dif <- 1
  }else{
    dif <- data[row,'receipt_id'] - data[row-1,'receipt_id']
  }

  if(dif != 1){
    id = id + 1
  }

  data[row,'ID'] = id
}
查看更多
ら.Afraid
4楼-- · 2020-05-06 14:56

We can use data.table

library(data.table)
setDT(df)[, id := cumsum(c(TRUE, diff(receipt_id)!=1))]

Or use the shift

setDT(df)[, id := cumsum((receipt_id - shift(receipt_id, fill=receipt_id[1]))!=1)]
查看更多
Animai°情兽
5楼-- · 2020-05-06 15:01

Had a similar notion to @Psidom, but he beat me to the punch with cumsum. Here's a dplyr solution. Adding in group_by can give you added flexibility if you want to restart ids by customer number.

df %>% 
  mutate(id = cumsum(c(TRUE, diff(receipt_id) != 1)))
查看更多
登录 后发表回答