How to grep a group based on string in another col

2019-08-28 01:13发布

Have to simplify a previous question that failed.

I want to extract whole groups, identified by 'id', that contain a string ('inter' or 'high') in another column called 'strmatch'. The string doesn't occurr in every observation of the group, but if it occurs I want to assign the group to a respective data frame.

The data frame

df <- data.frame(id = c("a", "a", "b", "b","c", "c","d","d"),
                 std = c("y", "y","n","n","y","y","n","n"),
                 strmatch = c("alpha","TMB-inter","beta","TMB-high","gamma","delta","epsilon","TMB-inter"))

Looks like this

id  std strmatch
a   y   alpha
a   y   TMB-inter
b   n   beta
b   n   TMB-high
c   y   gamma
c   y   delta
d   n   epsilon
d   n   TMB-inter

Expected result

dfa

id  std strmatch
a   y   alpha
a   y   TMB-inter
d   n   epsilon
d   n   TMB-inter

dfb

id  std strmatch
b   n   beta
b   n   TMB-high

dfc

id  std strmatch
c   y   gamma
c   y   delta

What I've tried

split(df, grepl("high", df$strmatch))

Gives only two data frames, one with a row containing 'high' and the other one with the rest.

Thanks a lot for your help.

标签： r dataframe group-by subset

1条回答

Lonely孤独者°

2楼-- · 2019-08-28 01:48

You could maybe divide this into two parts. First find out values which match "inter|high" and break them into separate dataframes and then find the one which do not match any of unique_vals.

unique_vals <- unique(grep("inter|high", df$strmatch, value = TRUE))

c(lapply(unique_vals, function(x) subset(df, id %in% id[strmatch == x])), 
         list(subset(df, !id %in% id[strmatch %in% unique_vals])))


#[[1]]
#  id std  strmatch
#1  a   y     alpha
#2  a   y TMB-inter
#7  d   n   epsilon
#8  d   n TMB-inter

#[[2]]
#  id std strmatch
#3  b   n     beta
#4  b   n TMB-high

#[[3]]
#  id std strmatch
#5  c   y    gamma
#6  c   y    delta

0人赞添加讨论(0) 举报

How to grep a group based on string in another col

采纳回答

编辑标签

举报内容

检举类型

检举原因

检举说明(必填)

打开微信“扫一扫”，打开网页后点击屏幕右上角分享按钮

付费偷看金额在0.1-10元之间