I have two dataframes:
1) data1: data1 <- data.frame(Group = c(1, 2, 3), Region = c("Southeast Med, Southeast Low, Southwest Low, Northeast Med", "Northeast High, East Med, Midwest Med High", "Midwest Low, California and HI, West High"),stringsAsFactors=F)
2) data2: data2 <- data.frame(Region = c('California and HI', 'California and HI', 'Northeast High', 'California and HI', 'West High', 'Midwest Med High', 'California and HI', 'California and HI', 'California and HI', 'Southwest Low', 'Midwest Med High', 'California and HI', 'East Med', 'Southeast Low', 'Southeast Med', 'Midwest Med High', 'Southeast Med', 'West High', 'Northeast High', 'California and HI', 'West High', 'California and HI', 'California and HI', 'West High', 'California and HI', 'West High', 'California and HI', 'California and HI'))
I want to create a new column in data2, say data2$Group
using data1, where the group column checks using data1 which region falls under which group and populates that.How can I do that? Also, say, data1 were a list instead of a dataframe, what would be the possible approach?
Using the datasets you posted you can do this
The new dataset
data2_upd
will look like:Note that this approach uses an exact string matching in order to join the 2 datasets. Therefore, it is case sensitive and any spaces before or after your region will "break" the join. This means that if your data are not as "clean" as in your example, you might have to do some pre-processing (e.g. update regions to lowercase, remove any initial / trailing spaces).