Suppose I have two data.table's:
A:
A B
1: 1 12
2: 2 13
3: 3 14
4: 4 15
B:
A B
1: 2 13
2: 3 14
and I have the following code:
merge_test = merge(dataA, dataB, by="A", all.data=TRUE)
I get:
A B.x B.y
1: 2 13 13
2: 3 14 14
However, I want all the rows in dataA in the final merged table. Is there a way to do this?
If you want to add the
b
values ofB
toA
, then it's best to joinA
withB
and updateA
by reference as follows:which gives:
This is a better appraoch than using
B[A, on='a']
because the latter just prints the result to the console. When you want to get the results back intoA
, you need to useA <- B[A, on='a']
which will give you the same result.The reason why
A[B, on = 'a', bb := i.b]
is better thanA <- B[A, on = 'a']
is memory efficiency. WithA[B, on = 'a', bb := i.b]
the location ofA
in memory stays the same:While on the other hand with
A <- B[A, on = 'a']
, a new object is created and saved in memory asA
and hence has another location in memory:Using
merge
(merge.data.table
) results in a similar change in memory location:For memory efficiency it is thus better to use an 'update-by-reference-join' syntax:
Although this doesn't make a noticable difference with small datasets like these, it does make a difference on large datasets for which
data.table
was designed.Probably also worth mentioning is that the order of
A
stays the same.Used data:
You can try this: