R: Creating a new column using another dataframe

I have two dataframes:

1) data1: data1 <- data.frame(Group = c(1, 2, 3), Region = c("Southeast Med, Southeast Low, Southwest Low, Northeast Med", "Northeast High, East Med, Midwest Med High", "Midwest Low, California and HI, West High"),stringsAsFactors=F)

2) data2: data2 <- data.frame(Region = c('California and HI', 'California and HI', 'Northeast High', 'California and HI', 'West High', 'Midwest Med High', 'California and HI', 'California and HI', 'California and HI', 'Southwest Low', 'Midwest Med High', 'California and HI', 'East Med', 'Southeast Low', 'Southeast Med', 'Midwest Med High', 'Southeast Med', 'West High', 'Northeast High', 'California and HI', 'West High', 'California and HI', 'California and HI', 'West High', 'California and HI', 'West High', 'California and HI', 'California and HI'))

I want to create a new column in data2, say data2$Group using data1, where the group column checks using data1 which region falls under which group and populates that.How can I do that? Also, say, data1 were a list instead of a dataframe, what would be the possible approach?

标签： r dataframe mapping match

1条回答

smile是对你的礼貌

2楼-- · 2019-06-11 02:54

Using the datasets you posted you can do this

library(tidyverse)

# update data1
data1_upd = data1 %>% separate_rows(Region, sep = ", ")

# join datasets
data2_upd = data2 %>% left_join(data1_upd, by="Region")

The new dataset data2_upd will look like:

#               Region Group
# 1  California and HI     3
# 2  California and HI     3
# 3     Northeast High     2
# 4  California and HI     3
# 5          West High     3
# 6   Midwest Med High     2
# 7  California and HI     3
# 8  California and HI     3
# 9  California and HI     3
# 10     Southwest Low     1
# 11  Midwest Med High     2
# 12 California and HI     3
# 13          East Med     2
# 14                      NA
# 15                      NA
# 16                      NA
# 17     Southeast Med     1
# 18         West High     3
# 19    Northeast High     2
# 20 California and HI     3
# 21         West High     3
# 22 California and HI     3
# 23 California and HI     3
# 24         West High     3
# 25 California and HI     3
# 26         West High     3
# 27 California and HI     3
# 28 California and HI     3

Note that this approach uses an exact string matching in order to join the 2 datasets. Therefore, it is case sensitive and any spaces before or after your region will "break" the join. This means that if your data are not as "clean" as in your example, you might have to do some pre-processing (e.g. update regions to lowercase, remove any initial / trailing spaces).

0人赞添加讨论(0) 举报

R: Creating a new column using another dataframe

采纳回答

编辑标签

举报内容

检举类型

检举原因

检举说明(必填)

打开微信“扫一扫”，打开网页后点击屏幕右上角分享按钮

付费偷看金额在0.1-10元之间