I have a dataset that looks something like this
Subject Year X Y
A 1990 1 0
A 1991 1 0
A 1992 2 0
A 1993 3 1
A 1994 4 0
A 1995 4 0
B 1990 0 0
B 1991 1 0
B 1992 1 0
B 1993 2 1
C 1991 1 0
C 1992 2 0
C 1993 3 0
C 1994 3 0
D 1991 1 0
D 1992 2 0
D 1993 3 0
D 1994 4 0
D 1995 5 0
D 1996 5 1
D 1997 6 0
How can I create two additional columns where
- A1 is 1 if X increased and the maximum for the subject is at least 4. Otherwise it is 0. I tried
data$A1 <- as.numeric(data$X >4)
However, it's not quite what I want. - A2 is a bit more complicated to explain and I have no clue how to perform it in R. But it basically has the same idea as A1 meaning that it still should capture all X's that are more than 3. Only, it should be = 1 when Y = 0 for the following 5 years. I give an example what the A2 variable should look like. Is it possible do this in R? Or do I need to do this manually?
Result:
Subject Year X A1 Y A2
A 1990 1 1 0 0
A 1991 1 0 0 0
A 1992 2 1 0 0
A 1993 3 1 1 0
A 1994 4 1 0 0
A 1995 4 0 0 0
B 1990 0 0 0 0
B 1991 1 0 0 0
B 1992 1 0 0 0
B 1993 2 0 1 0
C 1991 1 0 0 0
C 1992 2 0 0 0
C 1993 3 0 0 0
C 1994 3 0 0 0
D 1991 1 1 0 1
D 1992 2 1 0 1
D 1993 3 1 0 1
D 1994 4 1 0 1
D 1995 5 1 0 1
D 1996 5 0 1 0
D 1997 6 1 0 0
Rawdata without the variables A1 and A2:
> dput(data)
structure(list(Subject = structure(c(1L, 1L, 1L, 1L, 1L, 1L,
2L, 2L, 2L, 2L, 3L, 3L, 3L, 3L, 4L, 4L, 4L, 4L, 4L, 4L, 4L), .Label = c("A",
"B", "C", "D"), class = "factor"), Year = c(1990L, 1991L, 1992L,
1993L, 1994L, 1995L, 1990L, 1991L, 1992L, 1993L, 1991L, 1992L,
1993L, 1994L, 1991L, 1992L, 1993L, 1994L, 1995L, 1996L, 1997L
), X = c(1L, 1L, 2L, 3L, 4L, 4L, 0L, 1L, 1L, 2L, 1L, 2L, 3L,
3L, 1L, 2L, 3L, 4L, 5L, 5L, 6L), Y = c(0L, 0L, 0L, 1L, 0L, 0L,
0L, 0L, 0L, 1L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 1L, 0L)), .Names = c("Subject",
"Year", "X", "Y"), class = "data.frame", row.names = c(NA, -21L
))