I wish to create a sequential number within each run of equal values, like a counter of occurrences, which restarts once the value in the current row is different from the previous row.
Please find an example of input and expected output below.
dataset <- data.frame(input = c("a","b","b","a","a","c","a","a","a","a","b","c"))
dataset$counter <- c(1,1,2,1,2,1,1,2,3,4,1,1)
dataset
# input counter
# 1 a 1
# 2 b 1
# 3 b 2
# 4 a 1
# 5 a 2
# 6 c 1
# 7 a 1
# 8 a 2
# 9 a 3
# 10 a 4
# 11 b 1
# 12 c 1
My question is very similar to this one: Cumulative sequence of occurrences of values.
Package runner has dedicated solution to compute what needed.
streak_run
is the fastest solution and accepts vector as input.You need to use
sequence
andrle
:An efficient and more straightforward version of the function written below is available now in data.table package, called
rleid
. Using that, it's just:See
?rleid
for more on usage and examples. Thanks to @Henrik for the suggestion to update this post.rle
is definitely the most convenient way to do it (+1 @Ananda's). But one could do better (in terms of speed) on bigger data. You can use theduplist
andvecseq
functions (not exported) fromdata.table
as follows:Benchmarking on big data: