Count consecutive TRUE values within each block se

2020-03-19 02:32发布

Similar questions have been asked, and I've been trying to cobble the answers (rle, cumsum, etc.) together from various ones but it's taking me hours and I'm still not getting there.

I have a data set with a column containing TRUE / FALSE values only, e.g.:

x <- c(FALSE, FALSE, TRUE, TRUE, TRUE, FALSE, TRUE, FALSE, TRUE, TRUE, FALSE)

For each set of continuous TRUE values, I want to count the number of TRUEs in that set. The FALSE values can be ignored, i.e. I want an output for the above data that looks like this:

x2 <- c(0, 0, 1, 2, 3, 0, 1, 0, 1, 2, 0)

标签: r
7条回答
冷血范
2楼-- · 2020-03-19 03:10

Maybe a bit ugly, but here we use rle() to find the runs of TRUE values. Then use seq.int() to index the groups (which would also make groups for FALSE), but we multiply by the value so the FALSE indexes are turned to 0.

x <- c(FALSE, FALSE, TRUE, TRUE, TRUE, FALSE, TRUE, FALSE, TRUE, TRUE, FALSE)
with(rle(x), unlist(Map(`*`, sapply(lengths, seq.int), values)))
# [1] 0 0 1 2 3 0 1 0 1 2 0
查看更多
Animai°情兽
3楼-- · 2020-03-19 03:12

A simple one in base R:

ave(x, cumsum(!x), FUN = cumsum)

#[1] 0 0 1 2 3 0 1 0 1 2 0
查看更多
家丑人穷心不美
4楼-- · 2020-03-19 03:13

Here is another option using split and cumsum:

unlist(sapply(split(x, cumsum(x == FALSE)), cumsum), use.names = F)
# [1] 0 0 1 2 3 0 1 0 1 2 0

Benchmarking

Here are the microbenchmark results of all solutions so far:

library(microbenchmark);
library(runner);

set.seed(2017);
x <- sample(c(TRUE, FALSE), 10^4, replace = T);

microbenchmark(
    cumsum_thelatemail = ave(x[x], cumsum(!x)[x], FUN=seq_along),
    reduce_Onyambu = Reduce(function(a,b)ifelse(b==0,0,a)+b,x,accumulate = T),
    rle_MrFlick = with(rle(x), unlist(Map(`*`, sapply(lengths, seq.int), values))),
    runner_Gonzo = streak_run(x)*x,
    sequence_Henrik = sequence(rle(x)$lengths) * x,
    split_Evers = unlist(sapply(split(x, cumsum(x == FALSE)), cumsum), use.names = F)
)
#Unit: microseconds
#               expr       min        lq       mean    median        uq
# cumsum_thelatemail  3569.336  3713.939  4196.6491  3802.570  4115.896
#     reduce_Onyambu 40599.427 41884.466 45887.2020 43222.302 49277.158
#        rle_MrFlick  9349.131  9907.766 11353.1854 10602.481 11213.147
#       runner_Gonzo   275.912   293.085   316.6987   295.656   300.059
#    sequence_Henrik  2696.624  2840.593  3177.7400  2956.738  3179.673
#        split_Evers  4772.078  4954.352  5423.3227  5193.803  5528.410
#       max neval
#  11360.39   100
# 103999.41   100
#  46731.03   100
#    538.49   100
#  11670.56   100
#  13607.49   100
查看更多
劳资没心,怎么记你
5楼-- · 2020-03-19 03:14
sequence(rle(x)$lengths) * x
#[1] 0 0 1 2 3 0 1 0 1 2 0

Or if you can consider non-base (about 20 times faster on a 10^6 vector)

library(data.table)
rowid(rleid(x))*x
# [1] 0 0 1 2 3 0 1 0 1 2 0
查看更多
Lonely孤独者°
6楼-- · 2020-03-19 03:17

Sir, consider runner package created especially for a running counting, summing etc. Written entirely in C++.

devtools::instal_github("gogonzo/runner")
library(runner)

x <- c(FALSE, FALSE, TRUE, TRUE, TRUE, FALSE, TRUE, FALSE, TRUE, TRUE, FALSE)
streak_run(x)*x 
# [1] 0 0 1 2 3 0 1 0 1 2 0

Function streak_run counting consecutive occurrences of TRUE and also FALSE, and multiplying by x is a quicker and easier version of ifelse in this case.

One can also specify k parameter, which defines window size. Window size can be constant or specified by other vector of the same length.

查看更多
We Are One
7楼-- · 2020-03-19 03:18

You can use Reduce in that we add the numbers but if the next number is zero, we start adding again. It is tweaking the cumsum function with an ifelse. ie. Reduce(function(a,b),a+b,x,,T) is the cumsum(x) function.Now we just introduce an ifelse statement so that every moment the next value is zero, set the sum to zero and start adding again. Here is the code:

Reduce(function(a,b)ifelse(b==0,0,a)+b,x,accumulate = T)
 [1] 0 0 1 2 3 0 1 0 1 2 0

you can also use <<- and implement same logic as above

c(b<-0,sapply(x,function(a)b<<-ifelse(a==0,b<-0,a)+b))[-1]#Remove the first b<-0 that I added
 [1] 0 0 1 2 3 0 1 0 1 2 0

in the first one, the cumulative sum is taken to be a, while in the second one the cumulative sum is taken to be b

查看更多
登录 后发表回答