I have used the call below to "join" my datasets based on an inequality condition:
library(sqldf)
sqldf("select *
from dataset1 a,
dataset2 b
a.col1 <= b.col2")
However, is there a way I can do this without sqldf?
So far, I can only see merge functions that are based on simple joins on a particular common column.
Thanks!
Non-equi (or conditional) joins were recently implemented in data.table, and available in the current development version, v1.9.7. See installation instructions here.
require(data.table) # v1.9.7+
setDT(dataset1) # convert to data.tables
setDT(dataset2)
dataset1[dataset2, on=.(col1 < col2), nomatch=0L]
For each row of dataset2
, find matching row indices while joining on condition provided to the on
argument, and return all columns for those matching rows.
You could definitely do it in two steps utilizing merge
.
Example (the exact details of the merge are up to you):
lessRows <- which(df1$col1 < df2$col2)
df3 <- merge(df1, df2)[lessRows, ]
I've had that problem a few times and I think I got a solution using dplyr
! It might not be the best in terms of efficiency, but it works. I'll suppose you have a constant variable in each case called 'dummy' (or alternatively, it can be another variable to join by). Moreover, I assume dataset1's columns are a_colj and those of dataset2 are b_colj:
dataset1 %>%
inner_join(dataset2, by='dummy') %>%
filter(a_col1 <= b_col2)