可以将文章内容翻译成中文,广告屏蔽插件可能会导致该功能失效(如失效，请关闭广告屏蔽插件后再试):

问题:

I have a dataset that looks something like this

    Subject  Year   X   Y   
        A   1990    1   0   
        A   1991    1   0   
        A   1992    2   0   
        A   1993    3   1   
        A   1994    4   0   
        A   1995    4   0   
        B   1990    0   0   
        B   1991    1   0   
        B   1992    1   0   
        B   1993    2   1   
        C   1991    1   0   
        C   1992    2   0   
        C   1993    3   0   
        C   1994    3   0   
        D   1991    1   0   
        D   1992    2   0   
        D   1993    3   0   
        D   1994    4   0   
        D   1995    5   0   
        D   1996    5   1   
        D   1997    6   0

How can I create two additional columns where

A1 is 1 if X increased and the maximum for the subject is at least 4. Otherwise it is 0. I tried data$A1 <- as.numeric(data$X >4) However, it's not quite what I want.
A2 is a bit more complicated to explain and I have no clue how to perform it in R. But it basically has the same idea as A1 meaning that it still should capture all X's that are more than 3. Only, it should be = 1 when Y = 0 for the following 5 years. I give an example what the A2 variable should look like. Is it possible do this in R? Or do I need to do this manually?

Result:

            Subject  Year   X   A1   Y   A2
                A   1990    1    1   0    0
                A   1991    1    0   0    0
                A   1992    2    1   0    0
                A   1993    3    1   1    0
                A   1994    4    1   0    0
                A   1995    4    0   0    0
                B   1990    0    0   0    0
                B   1991    1    0   0    0
                B   1992    1    0   0    0 
                B   1993    2    0   1    0
                C   1991    1    0   0    0
                C   1992    2    0   0    0 
                C   1993    3    0   0    0 
                C   1994    3    0   0    0
                D   1991    1    1   0    1
                D   1992    2    1   0    1
                D   1993    3    1   0    1
                D   1994    4    1   0    1 
                D   1995    5    1   0    1 
                D   1996    5    0   1    0
                D   1997    6    1   0    0

Rawdata without the variables A1 and A2:

> dput(data)
structure(list(Subject = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 
2L, 2L, 2L, 2L, 3L, 3L, 3L, 3L, 4L, 4L, 4L, 4L, 4L, 4L, 4L), .Label = c("A", 
"B", "C", "D"), class = "factor"), Year = c(1990L, 1991L, 1992L, 
1993L, 1994L, 1995L, 1990L, 1991L, 1992L, 1993L, 1991L, 1992L, 
1993L, 1994L, 1991L, 1992L, 1993L, 1994L, 1995L, 1996L, 1997L
), X = c(1L, 1L, 2L, 3L, 4L, 4L, 0L, 1L, 1L, 2L, 1L, 2L, 3L, 
3L, 1L, 2L, 3L, 4L, 5L, 5L, 6L), Y = c(0L, 0L, 0L, 1L, 0L, 0L, 
0L, 0L, 0L, 1L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 1L, 0L)), .Names = c("Subject", 
"Year", "X", "Y"), class = "data.frame", row.names = c(NA, -21L
))

回答1:

We can do this with data.table

library(data.table)
setDT(data)[, A1 := if(any(X >=4)) c(1, diff(X)) else 0, by = Subject]
data[,  A2 := if(any(X >=3))  inverse.rle(within.list(rle(Y==0), 
              values[values][lengths[values] < 5] <- 0)) else 0, by = Subject]

data[, c("Subject", "Year", "X", "A1", "Y", "A2"), with = FALSE]
#    Subject Year X A1 Y A2
# 1:       A 1990 1  1 0  0
# 2:       A 1991 1  0 0  0
# 3:       A 1992 2  1 0  0
# 4:       A 1993 3  1 1  0
# 5:       A 1994 4  1 0  0
# 6:       A 1995 4  0 0  0
# 7:       B 1990 0  0 0  0
# 8:       B 1991 1  0 0  0
# 9:       B 1992 1  0 0  0
#10:       B 1993 2  0 1  0
#11:       C 1991 1  0 0  0
#12:       C 1992 2  0 0  0
#13:       C 1993 3  0 0  0
#14:       C 1994 3  0 0  0
#15:       D 1991 1  1 0  1
#16:       D 1992 2  1 0  1
#17:       D 1993 3  1 0  1
#18:       D 1994 4  1 0  1
#19:       D 1995 5  1 0  1
#20:       D 1996 5  0 1  0
#21:       D 1997 6  1 0  0

回答2:

Does that do the job? Do you need the Structure as factor? The code below does not yet realize the change in structure e.g. from C to D.

mydata <- structure("Your code here")
mydata$max <- rep(F, nrow(mydata))
mydata$A1 <- rep(0, nrow(mydata))
mydata$A2 <- rep(0, nrow(mydata))

for (i in unique(mydata$Subject)) {
  max <- max(mydata$X[mydata$Subject == i])
  if (max >=3) {
    mydata$max[mydata$Subject == i] <- T
  }
}
mydata$A1 <- ifelse(mydata$max & c(F,diff(mydata$X) > 0), 1, 0)

A2 is still unclear (See also my edit). Hopefully this helps to get the rest done.