Efficient left_join and subsequent merge

2019-09-16 02:43发布

问题:

I have the following data:

library(dplyr)

a<-data.frame("one"=c(1:10),
              "two"=c("","","","","a","a","a","a","a","a"), stringsAsFactors = F)

b<-data.frame("one"=c(4,2,6),
              "two"=c("C","D","A"), stringsAsFactors = F)

I want to left_join b onto a, such that a$two gets the value of b$two whenever a$one == b$one. This I do like this:

left_join(a, b, by="one")

In order to have the same structures as before, we can do the following

left_join(a, b, by="one") %>% 
  mutate(two=ifelse(is.na(two.y), two.x, two.y)) %>% 
  select(-c(two.x, two.y))

This gives me the desired output:

Is there a way to perform the left_join such that it isn't necessary to mutate and select to obtain the desired output? I.e., is there a more efficient way to get what I want? Right now it seems cumbersome to both mutate and select

回答1:

If we are looking for a compact and efficient option, then this can be achieved with data.table. After converting the 'a' to data.table, join on 'one' and assign (:=) the 'i.two' i.e. the column from 'b' to 'two' (from 'a')

library(data.table)
setDT(a)[b,two := i.two , on = .(one)]
a
#     one two
# 1:   1    
# 2:   2   D
# 3:   3    
# 4:   4   C
# 5:   5   a
# 6:   6   A
# 7:   7   a
# 8:   8   a
# 9:   9   a
#10:  10   a