I have two data frames:
df1
x1 x2
1 a
2 b
3 c
4 d
and
df2
x1 x2
2 zz
3 qq
I want to replace some of the values in df1$x2 with values in df2$x2 based on the conditional match between df1$x1 and df2$x2 to produce:
df1
x1 x2
1 a
2 zz
3 qq
4 d
The first part of Joris' answer is good, but in the case of non-unique values in
df1
, the row-wise for-loop will not scale well on large data.frames.You could use a
data.table
"update join" to modify in place, which will be quite fast:Or, assuming you don't care about maintaining row order, you could use SQL-inspired
dplyr
:Either of these will scale much better than the row-wise for-loop.
I see that Joris and Aaron have both chosen to build examples without factors. I can certainly understand that choice. For the reader with columns that are already factors there would also be to option of coercion to "character". There is a strategy that avoids that constraint and which also allows for the possibility that there may be indices in df2 that are not in df1 which I believe would invalidate Joris Meys but not Aarons solutions posted so far:
It requires that the levels be expanded to include the intersection of both factor variables and then also the need to drop non-matching columns (= NA values) in match(df1$x1, df2$x1)
use
match()
, assuming values in df1 are unique.If the values aren't unique, use :
You can do it by matching the other way too but it's more complicated. Joris's solution is better but I'm putting this here also as a reminder to think about which way you want to match.