This question already has an answer here:
-
Numbering rows within groups in a data frame
5 answers
Suppose there is a vector with numerical values with possible duplicated values
x <- c(1, 2, 3, 4, 5, 1, 2, 2, 3)
I want to create another vector of counts as follows.
- It has the same length as
x
.
- For each unique value in
x
, the first appearance is 1, the second appearance is 2, and so on.
The new vector I want is
1, 1, 1, 1, 1, 2, 2, 3, 2
I need a fast way of doing this since x
can be really long.
Use ave
and seq_along
:
> x <- c(1, 2, 3, 4, 5, 1, 2, 2, 3)
> ave(x, x, FUN = seq_along)
[1] 1 1 1 1 1 2 2 3 2
Another option to consider is data.table
. Although it is a little bit more work, it might pay off on very long vectors.
Here it is on your example--definitely seems like overkill!
library(data.table)
x <- c(1, 2, 3, 4, 5, 1, 2, 2, 3)
DT <- data.table(id = sequence(length(x)), x, key = "id")
DT[, y := sequence(.N), by = x][, y]
# [1] 1 1 1 1 1 2 2 3 2
But how about on a vector 10,000,000 items long?
set.seed(1)
x2 <- sample(100, 1e7, replace = TRUE)
funAve <- function() {
ave(x2, x2, FUN = seq_along)
}
funDT <- function() {
DT <- data.table(id = sequence(length(x2)), x2, key = "id")
DT[, y := sequence(.N), by = x2][, y]
}
identical(funAve(), funDT())
# [1] TRUE
library(microbenchmark)
# Unit: seconds
# expr min lq median uq max neval
# funAve() 6.727557 6.792743 6.827117 6.992609 7.352666 20
# funDT() 1.967795 2.029697 2.053886 2.070462 2.123531 20