dplyr string match and replace based on lookup tab

2019-08-27 17:38发布

问题:

I'm trying to achieve a function that I previously did in Excel, but having trouble finding a means to achieve it.

I have two datasets: one is my base dataset and the other is a lookup table. My base has two columns, the first and last names of people. My lookup table has these first two columns as well, but it also includes a replacement first name.

People <- data.frame(
  Fname = c("Tom","Tom","Jerry","Ben","Rod","John","Perry","Rod"),
  Sname = c("Harper","Kingston","Ribery","Ghazali","Baker","Falcon","Jefferson","Lombardy")
)

Lookup <- data.frame(
  Fname = c("Tom","Tom","Rod","Rod"),
  Sname = c("Harper","Kingston","Baker","Lombardy"),
  NewFname = c("Tommy","Tim","Roderick","Robert")
)

What I want to do is to replace the Fname with NewFname, dependent upon two conditions: that Fname and Sname match in both dataframes. This is because I have a dataset with other 40,000 rows of data that need to be processed. Ultimately, I'd hope to end up with the following dataframe:

People <- data.frame(
  Fname = c("Tommy","Tim","Jerry","Ben","Roderick","John","Perry","Robert"),
  Sname = c("Harper","Kingston","Ribery","Ghazali","Baker","Falcon","Jefferson","Lombardy")
)

However, I want a function solution so I don't have to manually input the conditions and replacement names individually. So far, I have the following (problematic) solution, which would involve generating a new column using mutate in dplyr, but it's not working

 People %>%
  mutate(NewName = if_else(
    Fname == Lookup$Fname & Sname == Lookup$Sname, NewFname, Fname
  ))

回答1:

Simply use a left_join and then mutate on !is.na()

library(dplyr)
People %>% 
  left_join(Lookup, by = c("Fname", "Sname")) %>% 
  mutate(Fname = ifelse(!is.na(NewFname), NewFname, Fname))
# Fname     Sname       NewFname
# 1    Tommy    Harper    Tommy
# 2      Tim  Kingston      Tim
# 3    Jerry    Ribery     <NA>
# 4      Ben   Ghazali     <NA>
# 5 Roderick     Baker Roderick
# 6     John    Falcon     <NA>
# 7    Perry Jefferson     <NA>
# 8   Robert  Lombardy   Robert

I left NewFname just to make it clear whats happening.

Data:

People <- data.frame(
  Fname = c("Tom","Tom","Jerry","Ben","Rod","John","Perry","Rod"),
  Sname = c("Harper","Kingston","Ribery","Ghazali","Baker","Falcon","Jefferson","Lombardy"), stringsAsFactors = F
)

Lookup <- data.frame(
  Fname = c("Tom","Tom","Rod","Rod"),
  Sname = c("Harper","Kingston","Baker","Lombardy"),
  NewFname = c("Tommy","Tim","Roderick","Robert"), stringsAsFactors = F
)