I would like to subset rows of my data
library(data.table); set.seed(333); n <- 100
dat <- data.table(id=1:n, group=rep(1:2,each=n/2), x=runif(n,100,120), y=runif(n,200,220), z=runif(n,300,320))
> head(dat)
id group x y z
1: 1 1 109.3400 208.6732 308.7595
2: 2 1 101.6920 201.0989 310.1080
3: 3 1 119.4697 217.8550 313.9384
4: 4 1 111.4261 205.2945 317.3651
5: 5 1 100.4024 212.2826 305.1375
6: 6 1 114.4711 203.6988 319.4913
in several stages within each group. I need to automate this and it might happen that the subset is empty. For example, focusing only on group 1,
dat1 <- dat[1:50]
> s <-subset(dat1,x>119)
> s
id group x y z
1: 3 1 119.4697 217.8550 313.9384
2: 50 1 119.2519 214.2517 318.8567
the second step subset(s, y>219)
would come up empty but I would still want to apply the third step subset(s,z>315)
. If I were to set the threshold manually, Frank has provided an excellent solution here that outputs
> f(dat1, x>119, y>219, z>315)
cond skip
1: x > 119 FALSE
2: y > 219 TRUE
3: z > 315 FALSE
id group x y z
1: 50 1 119.2519 214.2517 318.8567
and reports which parts were skipped.
My problem is that I need to apply this to different groups simultaneously, where the thresholds for each group are given in a separate data.table. The goal is to have at least one id
per group. For example, if my thresholds were
c <- data.table(group=1:2, x=c(119,119), y=c(219,219), z=c(315,319))
> c
group x y z
1: 1 119 219 315
2: 2 119 219 319
I would like to end up with
> res
id group x y z
1: 50 1 119.2519 214.2517 318.8567
2: 55 2 119.2634 219.0044 315.6556
I could apply Frank's function repeatedly within a for-loop but I am sure there are cleverer ways that save time. I wonder, for instance, whether the function can be applied to each group within data.table. Or perhaps there is a way within the tidyverse, which I am not really familiar with yet.