Similar questions have been asked, and I've been trying to cobble the answers (rle
, cumsum
, etc.) together from various ones but it's taking me hours and I'm still not getting there.
I have a data set with a column containing TRUE
/ FALSE
values only, e.g.:
x <- c(FALSE, FALSE, TRUE, TRUE, TRUE, FALSE, TRUE, FALSE, TRUE, TRUE, FALSE)
For each set of continuous TRUE
values, I want to count the number of TRUE
s in that set. The FALSE
values can be ignored, i.e. I want an output for the above data that looks like this:
x2 <- c(0, 0, 1, 2, 3, 0, 1, 0, 1, 2, 0)
Maybe a bit ugly, but here we use
rle()
to find the runs of TRUE values. Then useseq.int()
to index the groups (which would also make groups for FALSE), but we multiply by the value so the FALSE indexes are turned to 0.A simple one in base R:
Here is another option using
split
andcumsum
:Benchmarking
Here are the
microbenchmark
results of all solutions so far:Or if you can consider non-
base
(about 20 times faster on a10^6
vector)Sir, consider runner package created especially for a running counting, summing etc. Written entirely in C++.
Function
streak_run
counting consecutive occurrences of TRUE and also FALSE, and multiplying byx
is a quicker and easier version of ifelse in this case.One can also specify
k
parameter, which defines window size. Window size can be constant or specified by other vector of the same length.You can use
Reduce
in that we add the numbers but if the next number is zero, we start adding again. It is tweaking thecumsum
function with an ifelse. ie.Reduce(function(a,b),a+b,x,,T)
is thecumsum(x)
function.Now we just introduce anifelse
statement so that every moment the next value is zero, set the sum to zero and start adding again. Here is the code:you can also use
<<-
and implement same logic as abovein the first one, the cumulative sum is taken to be
a
, while in the second one the cumulative sum is taken to beb