data.table “key indices” or “group counter”

2019-01-03 17:01发布

After creating a key on a data.table:

set.seed(12345)
DT <- data.table(x = sample(LETTERS[1:3], 10, replace = TRUE),
                 y = sample(LETTERS[1:3], 10, replace = TRUE))
setkey(DT, x, y)
DT
#       x y
#  [1,] A B
#  [2,] A B
#  [3,] B B
#  [4,] B B
#  [5,] C A
#  [6,] C A
#  [7,] C A
#  [8,] C A
#  [9,] C C
# [10,] C C

I would like to get an integer vector giving for each row the corresponding "key index". I hope the expected output (column i) below will help clarify what I mean:

#       x y i
#  [1,] A B 1
#  [2,] A B 1
#  [3,] B B 2
#  [4,] B B 2
#  [5,] C A 3
#  [6,] C A 3
#  [7,] C A 3
#  [8,] C A 3
#  [9,] C C 4
# [10,] C C 4

I thought about using something like cumsum(!duplicated(DT[, key(DT), with = FALSE])) but am hoping there is a better solution. I feel this vector could be part of the table's internal representation, and maybe there is a way to access it? Even if it is not the case, what would you suggest?

标签: r data.table
2条回答
孤傲高冷的网名
2楼-- · 2019-01-03 17:37

I'd probably just do this, since I'm fairly confident that no index counter is available from within the call to [.data.table():

ii <- unique(DT)
ii[ , i := seq_len(nrow(ii))]
DT[ii]
#     x y i
#  1: A B 1
#  2: A B 1
#  3: B B 2
#  4: B B 2
#  5: C A 3
#  6: C A 3
#  7: C A 3
#  8: C A 3
#  9: C C 4
# 10: C C 4

You could make this a one-liner, at the expense of an additional call to unique.data.table():

DT[unique(DT)[ , i := seq_len(nrow(unique(DT)))]]
查看更多
▲ chillily
3楼-- · 2019-01-03 17:38

Update: From v1.8.3, you can simply use the inbuilt special .GRP:

DT[ , i := .GRP, by = key(DT)]

See history for older answers.

查看更多
登录 后发表回答