I'm working on a study where we used a camera placed inside a nest box to determine when our study species laid its first egg. Some of the cameras weren't super reliable, and I'd like to see if there were continuous photos before the date where the first egg was laid. This way I can no for sure that this is the first egg date. There are >165,000 photos and >200 nests, so I grouped by nest box ID, filtered the rows down to those that have at least 1 egg, and then used the slice function to select the first row with data. Here's a reproducible example:
example <- structure(list(boxID = c("CA10", "CA10", "CA10", "CA10", "CA10",
"CA10", "CA10", "CA10", "CA10", "CA10", "CA10", "CA10", "CA10",
"CA10", "CA10"), visitType = c("Image", "Image", "Image", "Image",
"Image", "Image", "Image", "Image", "Image", "Image", "Image",
"Image", "Image", "Image", "Image"), day = c(25L, 25L, 25L, 26L,
26L, 26L, 27L, 27L, 27L, 28L, 28L, 28L, 29L, 29L, 29L), month = c("MAR",
"MAR", "MAR", "MAR", "MAR", "MAR", "MAR", "MAR", "MAR", "MAR",
"MAR", "MAR", "MAR", "MAR", "MAR"), year = c(2018, 2018, 2018,
2018, 2018, 2018, 2018, 2018, 2018, 2018, 2018, 2018, 2018, 2018,
2018), timeChecked = c("02:59", "09:06", "15:13", "02:59", "09:07",
"15:14", "02:59", "09:07", "15:13", "02:58", "09:06", "15:12",
"02:58", "09:06", "15:12"), species = c("Empty", "Empty", "Empty",
"Empty", "Empty", "Empty", "Empty", "Empty", "American Kestrel",
"Empty", "American Kestrel", "American Kestrel", "American Kestrel",
"American Kestrel", "American Kestrel"), sexAdult = c(NA, NA,
NA, NA, NA, NA, NA, NA, "Female", NA, "Female", "Female", "Female",
NA, NA), numEggs = c(NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
"1", "1", "1", "1", "1"), numNestlings = c(NA_character_, NA_character_,
NA_character_, NA_character_, NA_character_, NA_character_, NA_character_,
NA_character_, NA_character_, NA_character_, NA_character_, NA_character_,
NA_character_, NA_character_, NA_character_), date = structure(c(17615,
17615, 17615, 17616, 17616, 17616, 17617, 17617, 17617, 17618,
17618, 17618, 17619, 17619, 17619), class = "Date")), class = c("tbl_df",
"tbl", "data.frame"), row.names = c(NA, -15L), .Names = c("boxID",
"visitType", "day", "month", "year", "timeChecked", "species",
"sexAdult", "numEggs", "numNestlings", "date"))
Here's the code I have to find the first row that has at least 1 egg:
example %>%
mutate_at(vars(numEggs, numNestlings), na_if, 'unknown') %>% # remove unknowns and other values that should be NA
select(boxID, date, numEggs, visitType) %>%
group_by(boxID) %>%
filter(numEggs > 0) %>%
slice(1)
I'd like to look at the 5 or 10 rows before this first row with an egg to make sure there was continuous data up to this point in time. Is there a way to do this row indexing with slice or some other dplyr function?
Here a way based on slicing according to the position of the first non missing value of
numEggs
. You can modify the 5 value inside the last row according to the rows you want retain before the first non-NAnumEggs
Here's one way.
match
returns the position of firstnumEggs > 0
and then we simply get additionaln_previous
rows from that position. We usemax(1, ...)
so that we don't get error if position of firstnumEggs > 0
<
n_previous
.