Select rows before a filtered row using dplyr

I'm working on a study where we used a camera placed inside a nest box to determine when our study species laid its first egg. Some of the cameras weren't super reliable, and I'd like to see if there were continuous photos before the date where the first egg was laid. This way I can no for sure that this is the first egg date. There are >165,000 photos and >200 nests, so I grouped by nest box ID, filtered the rows down to those that have at least 1 egg, and then used the slice function to select the first row with data. Here's a reproducible example:

example <- structure(list(boxID = c("CA10", "CA10", "CA10", "CA10", "CA10", 
    "CA10", "CA10", "CA10", "CA10", "CA10", "CA10", "CA10", "CA10", 
    "CA10", "CA10"), visitType = c("Image", "Image", "Image", "Image", 
    "Image", "Image", "Image", "Image", "Image", "Image", "Image", 
    "Image", "Image", "Image", "Image"), day = c(25L, 25L, 25L, 26L, 
    26L, 26L, 27L, 27L, 27L, 28L, 28L, 28L, 29L, 29L, 29L), month = c("MAR", 
    "MAR", "MAR", "MAR", "MAR", "MAR", "MAR", "MAR", "MAR", "MAR", 
    "MAR", "MAR", "MAR", "MAR", "MAR"), year = c(2018, 2018, 2018, 
    2018, 2018, 2018, 2018, 2018, 2018, 2018, 2018, 2018, 2018, 2018, 
    2018), timeChecked = c("02:59", "09:06", "15:13", "02:59", "09:07", 
    "15:14", "02:59", "09:07", "15:13", "02:58", "09:06", "15:12", 
    "02:58", "09:06", "15:12"), species = c("Empty", "Empty", "Empty", 
    "Empty", "Empty", "Empty", "Empty", "Empty", "American Kestrel", 
    "Empty", "American Kestrel", "American Kestrel", "American Kestrel", 
    "American Kestrel", "American Kestrel"), sexAdult = c(NA, NA, 
    NA, NA, NA, NA, NA, NA, "Female", NA, "Female", "Female", "Female", 
    NA, NA), numEggs = c(NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, 
    "1", "1", "1", "1", "1"), numNestlings = c(NA_character_, NA_character_, 
    NA_character_, NA_character_, NA_character_, NA_character_, NA_character_, 
    NA_character_, NA_character_, NA_character_, NA_character_, NA_character_, 
    NA_character_, NA_character_, NA_character_), date = structure(c(17615, 
    17615, 17615, 17616, 17616, 17616, 17617, 17617, 17617, 17618, 
    17618, 17618, 17619, 17619, 17619), class = "Date")), class = c("tbl_df", 
    "tbl", "data.frame"), row.names = c(NA, -15L), .Names = c("boxID", 
    "visitType", "day", "month", "year", "timeChecked", "species", 
    "sexAdult", "numEggs", "numNestlings", "date"))

Here's the code I have to find the first row that has at least 1 egg:

example %>%
      mutate_at(vars(numEggs, numNestlings), na_if, 'unknown') %>% # remove unknowns and other values that should be NA
      select(boxID, date, numEggs, visitType) %>%
      group_by(boxID) %>%
      filter(numEggs > 0) %>%
      slice(1)

I'd like to look at the 5 or 10 rows before this first row with an egg to make sure there was continuous data up to this point in time. Is there a way to do this row indexing with slice or some other dplyr function?

标签： r dplyr tidyverse

2条回答

你好瞎i

2楼-- · 2019-07-19 05:47

Here a way based on slicing according to the position of the first non missing value of numEggs. You can modify the 5 value inside the last row according to the rows you want retain before the first non-NA numEggs

example %>%
      mutate_at(vars(numEggs, numNestlings), na_if, 'unknown') %>%
      select(boxID, date, numEggs, visitType) %>%
      group_by(boxID) %>%
      slice((min(which(!is.na(numEggs)))-5):min(which(!is.na(numEggs))))

    # A tibble: 6 x 4
# Groups:   boxID [1]
  boxID date       numEggs visitType
  <chr> <date>     <chr>   <chr>    
1 CA10  2018-03-26 NA      Image    
2 CA10  2018-03-27 NA      Image    
3 CA10  2018-03-27 NA      Image    
4 CA10  2018-03-27 NA      Image    
5 CA10  2018-03-28 NA      Image    
6 CA10  2018-03-28 1       Image

0人赞添加讨论(0) 举报

Viruses.

3楼-- · 2019-07-19 05:54

Here's one way. match returns the position of first numEggs > 0 and then we simply get additional n_previous rows from that position. We use max(1, ...) so that we don't get error if position of first numEggs > 0 < n_previous .

n_previous <- 5

example %>%
  mutate_at(vars(numEggs, numNestlings), na_if, 'unknown') %>% 
  select(boxID, date, numEggs, visitType) %>%
  group_by(boxID) %>%
  slice(max(1, match(TRUE, numEggs > 0) - n_previous):match(TRUE, numEggs > 0))

# A tibble: 6 x 4
# Groups:   boxID [1]
  boxID date       numEggs visitType
  <chr> <date>     <chr>   <chr>    
1 CA10  2018-03-26 <NA>    Image    
2 CA10  2018-03-27 <NA>    Image    
3 CA10  2018-03-27 <NA>    Image    
4 CA10  2018-03-27 <NA>    Image    
5 CA10  2018-03-28 <NA>    Image    
6 CA10  2018-03-28 1       Image

0人赞添加讨论(0) 举报

Select rows before a filtered row using dplyr

采纳回答

编辑标签

举报内容

检举类型

检举原因

检举说明(必填)

打开微信“扫一扫”，打开网页后点击屏幕右上角分享按钮

付费偷看金额在0.1-10元之间