I am a fairly new user and I need your help with a task that I am stuck on. If my question has been asked/answered before I would be grateful if you could kindly guide me to the relevant page.
I have the following data set (lbnp_br) which is optical density (OD) measured over time (in seconds):
time OD
1891 -244.6
1891.5 -244.4
1892 -242
1892.5 -242
1893 -241.1
1893.5 -242.4
1894 -245.2
1894.5 -249.6
**1895 -253.9**
1895.5 -254.5
1896 -251.9
1896.5 -246.7
1897 -242.4
1897.5 -234.6
1898 -225.5
I need to find out how responsive the study device is by measuring how long it takes to reach the threshold for optical density.
For this I have calculated the coefficient of variation (CV) of OD and I am using mean OD (-252.9098) +/- 2*CV to define a response threshold. For the above data the threshold is set as (mean OD + 2*CV = -252.9917), and (mean OD - 2*CV = -252.8278).
I now need to calculate the time in seconds from the start (1891 seconds) to the first OD value that exceed the +/- threshold values. For example for the above data frame this threshold is exceeded at 1895 seconds corresponding to an OD of -253.9.
I now have to repeat this 3 times for each study subject and 17 subjects overall, thus, I am looking for a function where I can define the data frame and the threshold values, and it will return the first OD value where it exceeds the defined thresholds (all_threshold$sup_2_minus) and (all_threshold$sup_2_plus) and its corresponding time.
I have tried subset
a advised elsewhere:
subset(lbnp_br, lbnp_br$OD < all_threshold$sup_2_minus & lbnp_br$OD > all_threshold$sup_2_plus)
However, this doesn't return what I am looking for.
and also
ifelse(lbnp_br$OD > all_threshold$sup_2_plus & lbnp_br$OD < all_threshold$sup_2_minus, lbnp_br$OD, NA)
which returns NA and doesn't specify the exact value of OD and the time.
This is not a short answer, but hopefully clear. It uses the dplyr package:
library(dplyr)
find_time = function(df, threshold){
return_value = df %>%
arrange(time) %>%
filter(OD < threshold) %>%
slice(1)
return(return_value)
}
find_time(data, threshold)
This will sort (arrange) your data based on time, subset (filter) your data for values of OD below the threshold, take the first value (slice), and return it.
A one liner:
function (dfr, threshold) dfr$OD[ min(which(dfr$OD > threshold)) ]
Gives a warning and NA
if there is no such row in the data frame, which is probably what you want.
An alternative, purrr
-based solution:
function (dfr, threshold) purrr::detect(dfr$OD, ~ .x > threshold)
which returns NULL
if nothing is found, more correct I guess.
Using the above code, I added a few extra conditions to get exactly what I was looking for and here it is for anyone who may need something similar:
find_time <- function(df, df2, df3, threshold_1, threshold_2, threshold_3, threshold_4, threshold_5, threshold_6){
return_value_1 = df %>%
arrange(time) %>%
filter(OD > threshold_1) %>%
slice_(1)
colnames(return_value_1)[1] <- "time_hdt_upper"
colnames(return_value_1)[2] <- "OD_hdt_upper"
if (nrow(return_value_1) == 0) {
return_value_1[1,1] <- NA
return_value_1[1,2] <- NA
}
return_value_2 = df %>%
arrange(time) %>%
filter(OD < threshold_2) %>%
slice_(1)
colnames(return_value_2)[1] <- "time_hdt_lower"
colnames(return_value_2)[2] <- "OD_hdt_lower"
if (nrow(return_value_2) == 0) {
return_value_2[1,1] <- NA
return_value_2[1,2] <- NA
}
return_value_3 = df2 %>%
arrange(time) %>%
filter(OD > threshold_3) %>%
slice_(1)
colnames(return_value_3)[1] <- "time_lbnp_upper"
colnames(return_value_3)[2] <- "OD_lbnp_upper"
if (nrow(return_value_3) == 0) {
return_value_3[1,1] <- NA
return_value_3[1,2] <- NA
}
return_value_4 = df2 %>%
arrange(time) %>%
filter(OD < threshold_4) %>%
slice_(1)
colnames(return_value_4)[1] <- "time_lbnp_lower"
colnames(return_value_4)[2] <- "OD_lbnp_lower"
if (nrow(return_value_4) == 0) {
return_value_4[1,1] <- NA
return_value_4[1,2] <- NA
}
return_value_5 = df3 %>%
arrange(time) %>%
filter(OD > threshold_5) %>%
slice_(1)
colnames(return_value_5)[1] <- "time_hut_upper"
colnames(return_value_5)[2] <- "OD_hut_upper"
if (nrow(return_value_5) == 0) {
return_value_5[1,1] <- NA
return_value_5[1,2] <- NA
}
return_value_6 = df3 %>%
arrange(time) %>%
filter(OD < threshold_6) %>%
slice_(1)
colnames(return_value_6)[1] <- "time_hut_lower"
colnames(return_value_6)[2] <- "OD_hut_lower"
if (nrow(return_value_6) == 0) {
return_value_6[1,1] <- NA
return_value_6[1,2] <- NA
}
return(data.frame(return_value_1, return_value_2, return_value_3, return_value_4, return_value_5, return_value_6))
}
which gives
find_time_threshold <- find_time(hdt_br, lbnp_br, hut_br, all_threshold$base_plus, all_threshold$base_minus, all_threshold$sup_2_plus, all_threshold$sup_2_minus, all_threshold$sup_3_plus, all_threshold$sup_3_minus)
> find_time_threshold
time_hdt_upper OD_hdt_upper time_hdt_lower OD_hdt_lower time_lbnp_upper OD_lbnp_upper time_lbnp_lower
1 596.5 123.3 506 91.3 NA NA 1706
OD_lbnp_lower time_hut_upper OD_hut_upper time_hut_lower OD_hut_lower
1 -27.89 3186.5 -82.98 2909 -211.7