可以将文章内容翻译成中文,广告屏蔽插件可能会导致该功能失效(如失效，请关闭广告屏蔽插件后再试):

问题:

I wish to match two dataframes based on conditionals on more than one column but cannot figure out how. So if there are my data sets:

df1 <- data.frame(lower=c(0,5,10,15,20), upper=c(4,9,14,19,24), x=c(12,45,67,89,10))
df2 <- data.frame(age=c(12, 14, 5, 2, 9, 19, 22, 18, 23))

I wish to match age from df2 that falls into the range between lower and upper in df1 with the aim to add an extra column to df2 containing the value of x in df1 where age lies between upper and lower. i.e. I want df2 to look like

age    x
12    67
14    67
 5    45
....etc.

How can I achieve such a match ?

回答1:

I would go with a simple sapply and a "anded" condition in the df1$x selection like this:

df2$x <- sapply( df2$age, function(x) { df1$x[ x >= df1$lower & x <= df1$upper ] })

which gives:

For age 12 for example the selection inside the brackets gives:

> 12 >= df1$lower & 12 <= df1$upper
[1] FALSE FALSE  TRUE FALSE FALSE

So getting df1$x by this logical vector is easy as your ranges don't overlap

回答2:

Using foverlaps from data.table is what you are looking for:

library(data.table)
setDT(df1)
setDT(df2)[,age2:=age]
setkey(df1,lower,upper)
foverlaps(df2, df1, by.x = names(df2),by.y=c("lower","upper"))[,list(age,x)]

#    age  x
# 1:  12 67
# 2:  14 67
# 3:   5 45
# 4:   2 12
# 5:   9 45
# 6:  19 89
# 7:  22 10
# 8:  18 89
# 9:  23 10

回答3:

Here's another vectorized approach using findInterval on a melted data set

library(data.table) 
df2$x <- melt(setDT(df1), "x")[order(value), x[findInterval(df2$age, value)]]
#   age  x
# 1  12 67
# 2  14 67
# 3   5 45
# 4   2 12
# 5   9 45
# 6  19 89
# 7  22 10
# 8  18 89
# 9  23 10

The idea here is to

First, tidy up you data so lower and upper will be in the same column and x will have corresponding values to that new column,
Then, sort the data according to these ranges (necessary for findInterval).
Finally, run findInterval within the x column in order to find the correct incidences

And here's a possible dplyr/tidyr version

library(tidyr)
library(dplyr)
df1 %>%
  gather(variable, value, -x) %>%
  arrange(value) %>%
  do(data.frame(x = .$x[findInterval(df2$age, .$value)])) %>%
  cbind(df2, .)
#   age  x
# 1  12 67
# 2  14 67
# 3   5 45
# 4   2 12
# 5   9 45
# 6  19 89
# 7  22 10
# 8  18 89
# 9  23 10