Select specific rows based on previous row value (

2019-05-30 17:00发布

I've been trying to figure a way to script this through R, but just can't get it. I have a dataset like this:

Trial  Type Correct Latency     
1       55  0       0
3       30  1       766
4       10  1       344
6       40  1       716
7       10  1       326
9       30  1       550
10      10  1       350
11      64  0       0
13      30  1       683
14      10  1       270
16      30  1       666
17      10  1       297
19      40  1       616
20      10  1       315
21      64  0       0
23      40  1       850
24      10  1       322
26      30  1       566
27      20  0       766
28      40  1       500
29      20  1       230

which goes for much longer(around 1000 rows).

From this one dataset, I would like to create 4 separate data.frames/tables I can export tables with as well as do my own calculations

I would like to have a data.frame (4 in total), one for each of these bullet points:

  • type 10 rows which are preceded by a type 30 row
  • type 10 rows which are preceded by a type 40 row
  • type 20 rows which are preceded by a type 30 row
  • type 20 rows which are preceded by a type 40 row

I would like for all the columns in the relevant rows to be placed into these new tables, but only including the column info of row types 10 or 20.

For example, the first table (type 10 preceded by type 30) would like this based on the sample data:

Trial  Type Correct Latency     
  4       10     1       344
  10      10     1       350
  14      10     1       270
  17      10     1       297

Second table (type 10 preceded by type 40):

Trial    Type  Correct  Latency     
  7       10     1       326
  20      10     1       315
  24      10     1       322

Third table (type 20 preceded by type 30):

Trial    Type  Correct  Latency     
  27      20     0       766

Fourth table (table 20 preceded by type 40):

Trial    Type  Correct   Latency        
 29      20      1        230

I can subset just fine to get one table only of type 10 rows and another for type 20 rows, but I can't figure out how to create different tables for type 10 and 20 rows based on the previous type value. Also, an issue is that "Trials" is not in order (skips numbers).

Any help would be greatly appreciated. Thank you.

Also, is there a way to include the previous row as well, so the output for the fourth table would look something like this:

Fourth table (table 20 preceded by type 40):

Trial    Type  Correct   Latency        
 28      40      1        500
 29      20      1        230

2条回答
孤傲高冷的网名
2楼-- · 2019-05-30 17:21

Here is an example code if you always want to delete the first trials of your data.

var1 <- c(1,2,1,2,1,2,1,2,1,2)
var2 <- c(1,1,1,2,2,2,2,3,3,3)

dat <- data.frame(var1, var2)

var1 var2
1     1    1
2     2    1
3     1    1
4     2    2
5     1    2
6     2    2
7     1    2
8     2    3
9     1    3
10    2    3

#delete only this line directly
filter(dat,lag(var2)==var2)

var1 var2
1     1    1
2     2    1
3     1    1
6     2    2
7     1    2
10    2    3

#delete the first 2 trials
#make a list of all rows where var2[n-1]!=var2[n] --> using lag from dplyr
drops <- c(1,2,which(lag(dat$var2)!=dat$var2), which(lag(dat$var2)!=dat$var2)+1)
if (!identical(drops,numeric(0))) { dat <- dat[-drops,] }

var1 var2
3     1    1
6     2    2
7     1    2
10    2    3
查看更多
贪生不怕死
3楼-- · 2019-05-30 17:25

For the fourth example, you could use which() in combination with lag() from dplyr, to attain the indices that meet your criteria. Then you can use these to subset the data.frame.

# Get indices of rows that meet condition
ind2 <- which(df$Type==20 & dplyr::lag(df$Type)==40)
# Get indices of rows before the ones that meet condition
ind1 <- which(df$Type==20 & dplyr::lag(df$Type)==40)-1

# Subset data
> df[c(ind1,ind2)]
   Trial Type Correct Latency
1:    28   40       1     500
2:    29   20       1     230
查看更多
登录 后发表回答