How to create id variable by grouping sequenced nu

I want to add ID variable in this data. If receipt_ids are sequenced numbers, then those have same IDs.

CUST_NO_ID  receipt_id      dollar
  12         29             20.84
  12         30             20.21
  12         86             24.50
  12         87             20.68
  12        108             25.79
  12        109             24.93
  12        125             20.63
  12        126              9.90
  19        193             69.48
  19        194             46.88

here is my desired result

CUST_NO_ID  receipt_id      dollar       ID
  12         29             20.84        1
  12         30             20.21        1
  12         86             24.50        2
  12         87             20.68        2
  12        108             25.79        3
  12        109             24.93        3
  12        110             24.93        3
  12        125             20.63        4
  12        126              9.90        4
  19        193             69.48        5
  19        194             46.88        6

标签： r database dplyr

4条回答

傲

2楼-- · 2020-05-06 14:42

Assuming your data frame has already been sorted by CUST_NO_ID and receipt_id, you can use cumsum on a conditional vector where TRUE indicates the position a new ID should be created:

df$ID = cumsum(c(T, diff(df$receipt_id) != 1 | diff(df$CUST_NO_ID) != 0)))

df
#   CUST_NO_ID receipt_id dollar ID
#1          12         29  20.84  1
#2          12         30  20.21  1
#3          12         86  24.50  2
#4          12         87  20.68  2
#5          12        108  25.79  3
#6          12        109  24.93  3
#7          12        125  20.63  4
#8          12        126   9.90  4
#9          19        193  69.48  5
#10         19        194  46.88  5

0人赞添加讨论(0) 举报

【Aperson】

3楼-- · 2020-05-06 14:51

This does it

id <- 1

for(row in 1:nrow(data)){
  if(row == 1){
    dif <- 1
  }else{
    dif <- data[row,'receipt_id'] - data[row-1,'receipt_id']
  }

  if(dif != 1){
    id = id + 1
  }

  data[row,'ID'] = id
}

0人赞添加讨论(0) 举报

ら.Afraid

4楼-- · 2020-05-06 14:56

We can use data.table

library(data.table)
setDT(df)[, id := cumsum(c(TRUE, diff(receipt_id)!=1))]

Or use the shift

setDT(df)[, id := cumsum((receipt_id - shift(receipt_id, fill=receipt_id[1]))!=1)]

0人赞添加讨论(0) 举报

Animai°情兽

5楼-- · 2020-05-06 15:01

Had a similar notion to @Psidom, but he beat me to the punch with cumsum. Here's a dplyr solution. Adding in group_by can give you added flexibility if you want to restart ids by customer number.

df %>% 
  mutate(id = cumsum(c(TRUE, diff(receipt_id) != 1)))

0人赞添加讨论(0) 举报

How to create id variable by grouping sequenced nu

采纳回答

编辑标签

举报内容

检举类型

检举原因

检举说明(必填)

打开微信“扫一扫”，打开网页后点击屏幕右上角分享按钮

付费偷看金额在0.1-10元之间