structure(list(group = c(17L, 17L, 17L, 18L, 18L, 18L, 18L, 19L,
19L, 19L, 20L, 20L, 20L, 21L, 21L, 22L, 23L, 24L, 25L, 25L, 25L,
26L, 27L, 27L, 27L, 28L), var = c(74L, 49L, 1L, 74L, 1L, 49L,
61L, 49L, 1L, 5L, 5L, 1L, 44L, 44L, 12L, 13L, 5L, 5L, 1L, 1L,
4L, 4L, 1L, 1L, 1L, 49L), first = c(0, 0, 1, 0, 1, 0, 0, 0, 1,
0, 0, 1, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 1, 0, 0, 0)), .Names = c("group",
"var", "first"), class = c("tbl_df", "tbl", "data.frame"), row.names = c(NA,
-26L))
With the data from the first two column I would like to create a third column (called first
) where first == 1
only when var == 1
for the first time in a group. In other words I would like to mark first elements within group
that fullfil var == 1
. How can I do that in dplyr
? Certainly group_by
should be used but what next?
We can use the expression shown for
first
:giving:
The idea is to flag the minimum row number where
var
= 1, within each group.This will return some warnings, because in some groups there are no
var
= 1 cases.Another option would be this:
For ungrouped data, one solution is
so
(it seems appropriate to keep this as a logical vector, since that is what the column represents).
Another implementation is