Match and replace many values in data.table

2019-03-20 19:48发布

问题:

I have a dataset with many misnamed entries. I created a two column .csv that includes the old (incorrect) names in one column and the corresponding new (correct) names in the second column. Now I need to tell R to replace every old name in the data with the correct name.

testData = data.table(oldName = c("Nu York", "Was DC", "Buston",  "Nu York"))
replacements = data.table(oldName = c("Buston", "Nu York", "Was DC"), 
    newName = c("Boston", "New York", "Washington DC"))

    # The next line fails.
holder = replace(testData, testData[, oldName]==replacements[, oldName], 
    replacements[, newName]

回答1:

This is how I'd do that replacement:

setkey(testData, oldName)
setkey(replacements, oldName)

testData[replacements, oldName := newName]
testData
#         oldName
#1:        Boston
#2:      New York
#3:      New York
#4: Washington DC

You can add an index if you like the original order and put it back in original order at the end.



回答2:

I reached here looking for a solution and managed to tweak it to my requirement. If original order needs to be maintained then don't use setkey. I've added mutually exclusive rows on both tables for a better test.

library(data.table)

testData = data.table(
  city = c("Nu York", "Was DC", "Buston",  "Nu York", "Alabama")
)

If the join by column name in lookup table is same:

replacements = data.table(
  city = c("Buston", "Nu York", "Was DC", "tstDummy"), 
  city_newName = c("Boston", "New York", "Washington DC", "Test Dummy")
)

testData[replacements, city := city_newName, on=.(city)][]

If the join by column name in lookup table is different:

replacements = data.table(
  city_oldName = c("Buston", "Nu York", "Was DC", "tstDummy"), 
  city_newName = c("Boston", "New York", "Washington DC", "Test Dummy")
)

testData[replacements, city := city_newName, on=.(city = city_oldName)][]

Either way, testData will be changed to:

            city
1:      New York
2: Washington DC
3:        Boston
4:      New York
5:       Alabama

No keys are made and original order is retained.