Conditional merge/replacement in R

2019-01-02 23:46发布

I have two data frames:

df1
x1  x2
1   a
2   b
3   c
4   d

and

df2
x1  x2
2   zz
3   qq

I want to replace some of the values in df1$x2 with values in df2$x2 based on the conditional match between df1$x1 and df2$x2 to produce:

df1
x1  x2
1   a
2   zz
3   qq
4   d

标签: r match r-faq
4条回答
一纸荒年 Trace。
2楼-- · 2019-01-03 00:16

The first part of Joris' answer is good, but in the case of non-unique values in df1, the row-wise for-loop will not scale well on large data.frames.

You could use a data.table "update join" to modify in place, which will be quite fast:

library(data.table)
setDT(df1); setDT(df2)
df1[df2, on = .(x1), x2 := i.x2]

Or, assuming you don't care about maintaining row order, you could use SQL-inspired dplyr:

library(dplyr)
union_all(
  inner_join( df1["x1"], df2 ), # x1 from df1 with matches in df2, x2 from df2
  anti_join(  df1, df2["x1"] )  # rows of df1 with no match in df2
) # %>% arrange(x1) # optional, won't maintain an arbitrary row order

Either of these will scale much better than the row-wise for-loop.

查看更多
可以哭但决不认输i
3楼-- · 2019-01-03 00:27

I see that Joris and Aaron have both chosen to build examples without factors. I can certainly understand that choice. For the reader with columns that are already factors there would also be to option of coercion to "character". There is a strategy that avoids that constraint and which also allows for the possibility that there may be indices in df2 that are not in df1 which I believe would invalidate Joris Meys but not Aarons solutions posted so far:

df1 <- data.frame(x1=1:4,x2=letters[1:4])
df2 <- data.frame(x1=c(2,3,5), x2=c("zz", "qq", "xx") )

It requires that the levels be expanded to include the intersection of both factor variables and then also the need to drop non-matching columns (= NA values) in match(df1$x1, df2$x1)

 df1$x2 <- factor(df1$x2 , levels=c(levels(df1$x2), levels(df2$x2)) )
 df1$x2[na.omit(match(df2$x1,df1$x1))] <- df2$x2[which(df2$x1 %in% df1$x1)]
 df1
#-----------
  x1 x2
1  1  a
2  2 zz
3  3 qq
4  4  d
查看更多
家丑人穷心不美
4楼-- · 2019-01-03 00:30

use match(), assuming values in df1 are unique.

df1 <- data.frame(x1=1:4,x2=letters[1:4],stringsAsFactors=FALSE)
df2 <- data.frame(x1=2:3,x2=c("zz","qq"),stringsAsFactors=FALSE)

df1$x2[match(df2$x1,df1$x1)] <- df2$x2
> df1
  x1 x2
1  1  a
2  2 zz
3  3 qq
4  4  d

If the values aren't unique, use :

for(id in 1:nrow(df2)){
  df1$x2[df1$x1 %in% df2$x1[id]] <- df2$x2[id]
}
查看更多
beautiful°
5楼-- · 2019-01-03 00:31

You can do it by matching the other way too but it's more complicated. Joris's solution is better but I'm putting this here also as a reminder to think about which way you want to match.

df1 <- data.frame(x1=1:4, x2=letters[1:4], stringsAsFactors=FALSE)
df2 <- data.frame(x1=2:3, x2=c("zz", "qq"), stringsAsFactors=FALSE)
swap <- df2$x2[match(df1$x1, df2$x1)]
ok <- !is.na(swap)
df1$x2[ok] <- swap[ok]

> df1
  x1 x2
1  1  a
2  2 zz
3  3 qq
4  4  d
查看更多
登录 后发表回答