Create the frequency count from a vector in R [dup

2019-06-28 03:19发布

问题:

This question already has an answer here:

  • Numbering rows within groups in a data frame 5 answers

Suppose there is a vector with numerical values with possible duplicated values

x <- c(1, 2, 3, 4, 5, 1, 2, 2, 3)

I want to create another vector of counts as follows.

  1. It has the same length as x.
  2. For each unique value in x, the first appearance is 1, the second appearance is 2, and so on.

The new vector I want is

1, 1, 1, 1, 1, 2, 2, 3, 2

I need a fast way of doing this since x can be really long.

回答1:

Use ave and seq_along:

> x <- c(1, 2, 3, 4, 5, 1, 2, 2, 3)
> ave(x, x, FUN = seq_along)
[1] 1 1 1 1 1 2 2 3 2

Another option to consider is data.table. Although it is a little bit more work, it might pay off on very long vectors.

Here it is on your example--definitely seems like overkill!

library(data.table)

x <- c(1, 2, 3, 4, 5, 1, 2, 2, 3)
DT <- data.table(id = sequence(length(x)), x, key = "id")
DT[, y := sequence(.N), by = x][, y]
# [1] 1 1 1 1 1 2 2 3 2

But how about on a vector 10,000,000 items long?

set.seed(1)
x2 <- sample(100, 1e7, replace = TRUE)

funAve <- function() {
  ave(x2, x2, FUN = seq_along)
}

funDT <- function() {
  DT <- data.table(id = sequence(length(x2)), x2, key = "id")
  DT[, y := sequence(.N), by = x2][, y]
}

identical(funAve(), funDT())
# [1] TRUE

library(microbenchmark)
# Unit: seconds
#      expr      min       lq   median       uq      max neval
#  funAve() 6.727557 6.792743 6.827117 6.992609 7.352666    20
#   funDT() 1.967795 2.029697 2.053886 2.070462 2.123531    20


标签: r vector