I have a dataset with many misnamed entries. I created a two column .csv that includes the old (incorrect) names in one column and the corresponding new (correct) names in the second column. Now I need to tell R to replace every old name in the data with the correct name.
testData = data.table(oldName = c("Nu York", "Was DC", "Buston", "Nu York"))
replacements = data.table(oldName = c("Buston", "Nu York", "Was DC"),
newName = c("Boston", "New York", "Washington DC"))
# The next line fails.
holder = replace(testData, testData[, oldName]==replacements[, oldName],
replacements[, newName]
This is how I'd do that replacement:
setkey(testData, oldName)
setkey(replacements, oldName)
testData[replacements, oldName := newName]
testData
# oldName
#1: Boston
#2: New York
#3: New York
#4: Washington DC
You can add an index if you like the original order and put it back in original order at the end.
I reached here looking for a solution and managed to tweak it to my requirement. If original order needs to be maintained then don't use setkey
. I've added mutually exclusive rows on both tables for a better test.
library(data.table)
testData = data.table(
city = c("Nu York", "Was DC", "Buston", "Nu York", "Alabama")
)
If the join by column name in lookup table is same:
replacements = data.table(
city = c("Buston", "Nu York", "Was DC", "tstDummy"),
city_newName = c("Boston", "New York", "Washington DC", "Test Dummy")
)
testData[replacements, city := city_newName, on=.(city)][]
If the join by column name in lookup table is different:
replacements = data.table(
city_oldName = c("Buston", "Nu York", "Was DC", "tstDummy"),
city_newName = c("Boston", "New York", "Washington DC", "Test Dummy")
)
testData[replacements, city := city_newName, on=.(city = city_oldName)][]
Either way, testData
will be changed to:
city
1: New York
2: Washington DC
3: Boston
4: New York
5: Alabama
No keys are made and original order is retained.