R: Creating a new column using another dataframe

2019-06-11 02:13发布

I have two dataframes:

1) data1: data1 <- data.frame(Group = c(1, 2, 3), Region = c("Southeast Med, Southeast Low, Southwest Low, Northeast Med", "Northeast High, East Med, Midwest Med High", "Midwest Low, California and HI, West High"),stringsAsFactors=F)

2) data2: data2 <- data.frame(Region = c('California and HI', 'California and HI', 'Northeast High', 'California and HI', 'West High', 'Midwest Med High', 'California and HI', 'California and HI', 'California and HI', 'Southwest Low', 'Midwest Med High', 'California and HI', 'East Med', 'Southeast Low', 'Southeast Med', 'Midwest Med High', 'Southeast Med', 'West High', 'Northeast High', 'California and HI', 'West High', 'California and HI', 'California and HI', 'West High', 'California and HI', 'West High', 'California and HI', 'California and HI'))

I want to create a new column in data2, say data2$Group using data1, where the group column checks using data1 which region falls under which group and populates that.How can I do that? Also, say, data1 were a list instead of a dataframe, what would be the possible approach?

1条回答
smile是对你的礼貌
2楼-- · 2019-06-11 02:54

Using the datasets you posted you can do this

library(tidyverse)

# update data1
data1_upd = data1 %>% separate_rows(Region, sep = ", ")

# join datasets
data2_upd = data2 %>% left_join(data1_upd, by="Region")

The new dataset data2_upd will look like:

#               Region Group
# 1  California and HI     3
# 2  California and HI     3
# 3     Northeast High     2
# 4  California and HI     3
# 5          West High     3
# 6   Midwest Med High     2
# 7  California and HI     3
# 8  California and HI     3
# 9  California and HI     3
# 10     Southwest Low     1
# 11  Midwest Med High     2
# 12 California and HI     3
# 13          East Med     2
# 14                      NA
# 15                      NA
# 16                      NA
# 17     Southeast Med     1
# 18         West High     3
# 19    Northeast High     2
# 20 California and HI     3
# 21         West High     3
# 22 California and HI     3
# 23 California and HI     3
# 24         West High     3
# 25 California and HI     3
# 26         West High     3
# 27 California and HI     3
# 28 California and HI     3

Note that this approach uses an exact string matching in order to join the 2 datasets. Therefore, it is case sensitive and any spaces before or after your region will "break" the join. This means that if your data are not as "clean" as in your example, you might have to do some pre-processing (e.g. update regions to lowercase, remove any initial / trailing spaces).

查看更多
登录 后发表回答