I would like to create a new variable, Number, which sequentially generate numbers within a group ID, starting at a particular condition (in this case, when Percent > 5).
groupID <- c(1, 1, 1, 1, 2, 2, 2, 2, 3, 3, 3, 3)
Percent <- c( 3, 4, 5, 10, 2, 1, 6, 8, 4, 8, 10, 11)
Number <- ifelse (Percent < 5, 0, 1:4)
I get:
> Number
[1] 0 0 3 4 0 0 3 4 0 2 3 4
But I'd like:
0 0 1 2 0 0 1 2 0 1 2 3
I did not include groupID variable within the ifelse statement and used 1:4 instead, as there are always 4 rows within each groupID.
Any suggestions or clues? Thank you!
It's ugly and throws warnings, but it gets you what you want:
ave(Percent,groupID,FUN=function(x) {x[x<5] <- 0; x[x>=5] <- 1:4; x} )
#[1] 0 0 1 2 0 0 1 2 0 1 2 3
@BondedDust's answer below using cumsum
is almost certainly more appropriate though.
If your data was not always in ascending order in each group, you could also replace all the >=5
values like:
Percent <- c( 3, 5, 4, 10, 2, 1, 6, 8, 4, 8, 10, 11)
ave(Percent, list(groupID,Percent>=5), FUN=function(x) cumsum(x>=5))
#[1] 0 1 0 2 0 0 1 2 0 1 2 3
ave(Percent, groupID, FUN=function(x) cumsum(x>=5))
[1] 0 0 1 2 0 0 1 2 0 1 2 3
To the example in the comments below, this is my alternate logical test to be cumsum()
-ed:
ave(Percent, groupID, FUN=function(x) cumsum(seq_along(x)>= which(x >=5)[1]) )
Try this:
ID <- c(1, 1, 1, 1, 2, 2, 2, 2, 3, 3, 3, 3)
Percent <- c( 3, 4, 5, 10, 2, 1, 6, 8, 4, 8, 10, 11)
Number <- Percent >= 5
result = lapply(seq_along(Number), function(i){
if( length(which(! Number[1:i]) ) == 0){start = 1}
else {start =max(which(! Number[1:i]) )}
sum( Number[start : i])
})
> unlist(result)
[1] 0 0 1 2 0 0 1 2 0 1 2 3