I have used the call below to "join" my datasets based on an inequality condition:
library(sqldf)
sqldf("select *
from dataset1 a,
dataset2 b
a.col1 <= b.col2")
However, is there a way I can do this without sqldf?
So far, I can only see merge functions that are based on simple joins on a particular common column.
Thanks!
I've had that problem a few times and I think I got a solution using
dplyr
! It might not be the best in terms of efficiency, but it works. I'll suppose you have a constant variable in each case called 'dummy' (or alternatively, it can be another variable to join by). Moreover, I assume dataset1's columns are a_colj and those of dataset2 are b_colj:Non-equi (or conditional) joins were recently implemented in data.table, and available in the current development version, v1.9.7. See installation instructions here.
For each row of
dataset2
, find matching row indices while joining on condition provided to theon
argument, and return all columns for those matching rows.You could definitely do it in two steps utilizing
merge
.Example (the exact details of the merge are up to you):