Hello I would like to perform a rolling join in R using the data.table
package. There are multiple matches when joining on the "Date" column so I would like to use the by
argument in data.table
on the "Field" column to keep data from different fields from being joined together.
Example Data
d1<-structure(list(Field = c("6", "W62", "6", "6", "12S", "19-1",
"6", "6", "12S", "7", "6", "12S", "W62", "6", "12S", "W62", "12S",
"6", "6", "7", "12S", "12S", "W62", "7", "12S", "6", "12S", "7",
"12S", "7", "6", "7", "12S", "7", "6", "6", "6", "6", "12S",
"7", "7", "6", "6", "12S", "7", "12S", "12S", "12S", "19-1",
"6"), Date = structure(c(16994, 17240, 17240, 17401, 17048, 17417,
17387, 17394, 17382, 17414, 17029, 17403, 17045, 17359, 17179,
17281, 17152, 16972, 16987, 17042, 17282, 17415, 17281, 17266,
17179, 17190, 17057, 17380, 17280, 17178, 17178, 17343, 17373,
17043, 17190, 17343, 17253, 16981, 17079, 17043, 17270, 17366,
16981, 17357, 17366, 17415, 17079, 17190, 17385, 17008), class = "Date"),
NlbsAcre = c(NA, 18874.6557383659, 2477.08251404958, NA,
NA, 19658.0054165823, NA, NA, 12621.0827111083, NA, NA, 16764.41968227,
16764.9745173044, NA, 7671.24950330348, 21341.6444661863,
5197.26333885612, NA, NA, NA, 39560.8958554292, 18162.4040880297,
22578.1487456647, 15842.9161753361, 3613.95523726973, 2601.07083566694,
17766.9873538952, NA, 44728.1837479613, 2279.60909695434,
2014.7720270382, NA, 14847.7006686211, NA, 3082.31758038481,
NA, 2427.53558465175, NA, 23641.2999848709, NA, NA, NA, NA,
5928.31591997149, NA, 22162.2028819815, 18972.2228621189,
6534.4257935542, 12630.9231775315, NA)), .Names = c("Field",
"Date", "NlbsAcre"), class = c("data.table", "data.frame"), row.names = c(NA,
-50L), .internal.selfref = <pointer: 0x0000000006540788>)
d2<-structure(list(Field = c("6", "W62", "7", "12S", "19-1", "12S",
"6", "6", "19-1", "19-1", "6", "7", "W62", "19-1", "12S", "7",
"19-1", "7", "12S", "12S", "12S", "7", "6", "7", "6", "7", "W62",
"19-1", "6", "6", "12S", "12S", "6", "6", "12S", "6", "12S",
"19-1", "6", "W62", "W62", "6", "7", "7", "6", "19-1", "W62",
"6", "12S", "7"), Date = structure(c(16993, 17140, 17208, 17443,
17063, 16948, 17415, 16926, 17316, 16922, 16981, 17043, 17219,
17252, 17392, 17244, 17179, 17017, 17042, 17031, 17013, 17104,
17273, 16954, 17364, 16993, 17168, 17028, 17208, 16966, 17241,
16945, 17038, 17169, 17379, 17183, 17238, 17054, 17244, 16952,
17044, 17359, 17219, 17303, 17007, 17151, 16926, 17178, 17382,
17364), class = "Date"), TotN = c(79.244802845739, 94.193700050628,
21.075505564932, 692.152760834712, 224.689064446728, 172.576578578436,
47.406177406404, 102.53239575903, 818.80997295717, 476.174916307807,
125.828033450364, 58.270026966444, 75.465909993456, 435.049246131543,
337.913876678769, 31.714327953234, 305.353940577156, 72.621457768224,
393.815453005314, 428.540114240892, 318.97091713563, 73.888113736431,
79.0380747113805, 147.493527174027, 65.5311189906495, 59.269732271703,
119.390398108236, 110.706003557451, 21.96790939404, 149.060445984684,
128.143343232486, 208.621943093862, 75.770138571561, 47.496596179338,
132.723654607278, 43.92222198012, 145.150910469252, 215.88105225024,
21.393670871196, 72.969536052, 86.335878117078, 103.524169592979,
19.920230115264, 44.968722966108, 62.244487239885, 338.593490463303,
96.7285416279375, 45.537296152302, 422.630318314444, 58.5336350807685
)), .Names = c("Field", "Date", "TotN"), class = c("data.table",
"data.frame"), row.names = c(NA, -50L), .internal.selfref = <pointer: 0x0000000006540788>)
What I have tried
This is my attempt to perform a rolling join on the "Date" column, grouped by
the "Field" column. Obviously I can split the data by "Field" and process separately but I would like to avoid that option.
d1[d2, roll = "nearest", on = .(Date), by = .(Field)]
Error in `[.data.table`(d1, d2, roll = "nearest", on = .(Date), by = .(Field)) :
'by' or 'keyby' is supplied but not j
You were almost there.
You can join on multiple columns simultaneously. So, in addition to "Date", you can include "Field" in the
on
clause. But please note the description of theroll
argument in?data.table
:Thus, for "Date" to be used for the rolling join, specify it as the last variable in
on
:For better verification, the result can be ordered