R: cbind based on match first few letters or numbe

2019-05-30 05:50发布

问题:

I have df1 like this:

df1 <- data.frame(A=c("x01","x02","y03","z02","x04"), B=c("A01BB01","A02BB02","C02AA05","B04CC10","C01GX02"))

    A       B
1 x01 A01BB01
2 x02 A02BB02
3 y03 C02AA05
4 z02 B04CC10
5 x04 C01GX02

I have df2 like this.

  X     Y
1 a A01BB
2 b   A02
3 c  C02A
4 d   B04
5 e C01GX

df2 <- data.frame(X=c("a","b","c","d","e"), Y=c("A01BB","A02","C02A","B04","C01GX"))

I want to match the first few letters/ numbers in df1$B with those in df2$Y. And then merge two dataframe based on the best match, as such, we expect to see a results data frame like this:

  A       B   X     Y
1 x01 A01BB01   a A01BB
2 x02 A02BB02   b   A02
3 y03 C02AA05   c  C02A
4 z02 B04CC10   d   B04
5 x04 C01GX02   e C01GX

Could you mind to teach me how to do so? Thanks.

the Matching could only happens in the first few letters/number, the matched portion could not appear in the middle or the end of the words in df1$B, are there any effective way of doing this with R?

回答1:

You can use pmatch for this kind of matching:

with(c(df1,df2),{
  i <- pmatch(Y,B)
  data.frame(A,B,X = X[i],Y = Y[i])
})


标签: r match