I want to create a conditional dummy variable. Assume that I have a dataset that looks something like this:
Subject Year X X1
A 1990 1 0
A 1991 1 0
A 1992 2 0
A 1993 3 0
A 1994 4 0
A 1995 4 1
B 1990 0 0
B 1991 1 0
B 1992 1 0
B 1993 2 0
B 1994 3 0
C 1990 1 0
C 1991 2 0
C 1992 3 1
C 1993 3 0
D 1990 1 0
D 1991 2 0
D 1992 3 0
D 1993 4 1
D 1994 5 0
E 1990 1 0
E 1991 1 0
E 1992 2 1
E 1993 3 0
Let's call this conditional variable:Q1to3_noX1. Another variable of interest is Q1to3.
The Q1to3 variable is also a dummy variable indicating 1 when the X has reached value 3, and 0 otherwise (for each Subject). If the X is 4 or more, then the Q1to3 variable should be 0. The X is a cumulative variable (0,1,2,3,4...). So in other words, the Q1to3 is 1 if the maximum X value is 3.
I created this variable using: data$Q1to3 <- ave(data$X, data$Subject, FUN = function(x) if (max(x) == 3) 1 else 0)
(thanks to @Zelazny7).
The Q1to3_noX1 variable is very similar to the Q1to3 variable, but in contrast to the Q1to3 , it is conditional on the X1 variable. To be more precise, if the X1 = 1 in the following 5 years (counting from the first year of Q1to3), the Q1to3_no5 should be 0. In other words, the Q1to3_noX1 should be 1 if a)the maximum X value is 3, b) if X1=0 following 5 years(otherwise 0).
I understand from this question that I should use the rle
function. However, I haven't been able to apply it in this particular case. Do you have any suggestions?
The desirable outcome should look like this:
Subject Year X X1 Q1to3 Q1to3_noX1
A 1990 1 0 0 0
A 1991 1 0 0 0
A 1992 2 0 0 0
A 1993 3 0 0 0
A 1994 4 0 0 0
A 1995 4 1 0 0
B 1990 0 0 1 0
B 1991 1 0 1 1
B 1992 1 0 1 1
B 1993 2 0 1 1
B 1994 3 0 1 1
C 1990 1 0 1 0
C 1991 2 0 1 0
C 1992 3 1 1 0
C 1993 3 0 1 0
D 1990 1 0 0 0
D 1991 2 0 0 0
D 1992 3 0 0 0
D 1993 4 1 0 0
D 1994 5 0 0 0
E 1990 1 0 1 0
E 1991 1 0 1 0
E 1992 2 1 1 0
E 1993 3 0 1 0
A reproducible sample:
> dput(data)
structure(list(Subject = structure(c(1L, 1L, 1L, 1L, 1L, 1L,
2L, 2L, 2L, 2L, 2L, 3L, 3L, 3L, 3L, 4L, 4L, 4L, 4L, 4L, 5L, 5L,
5L, 5L), .Label = c("A", "B", "C", "D", "E"), class = "factor"),
Year = c(1990L, 1991L, 1992L, 1993L, 1994L, 1995L, 1990L,
1991L, 1992L, 1993L, 1994L, 1990L, 1991L, 1992L, 1993L, 1990L,
1991L, 1992L, 1993L, 1994L, 1990L, 1991L, 1992L, 1993L),
X = c(1L, 1L, 2L, 3L, 4L, 4L, 0L, 1L, 1L, 2L, 3L, 1L, 2L,
3L, 3L, 1L, 2L, 3L, 4L, 5L, 1L, 1L, 2L, 3L), X1 = c(0L, 0L,
0L, 0L, 0L, 1L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 1L, 0L, 0L, 0L,
0L, 1L, 0L, 0L, 0L, 1L, 0L), Q1to3 = c(0L, 0L, 0L, 0L, 0L,
0L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 0L, 0L, 0L, 0L, 0L,
1L, 1L, 1L, 1L), Q1to3_noX1 = c(0L, 0L, 0L, 0L, 0L, 0L, 0L,
1L, 1L, 1L, 1L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L,
0L, 0L)), .Names = c("Subject", "Year", "X", "X1", "Q1to3",
"Q1to3_noX1"), class = "data.frame", row.names = c(NA, -24L))