R subsetting by partially matching row name

2019-08-13 01:31发布

I have a tab delimited file:

row.names c1 c2 c3
AF3 0 2 4
BN4 9 1 2 
AF2 8 7 1
BN8 4 6 8

And I want to select only the rows with row names beginning with BN4, output would be like:

row.names c1 c2 c3
BN4 9 1 2 
BN8 4 6 8

I know how I would solve the problem if I knew the exact row names in a vector...

df[row.names(df) %in% c('BN4','BN8'), ]

But how would I solve the problem by finding and subsetting on the rows that start with 'BN'?

标签： r regex subset

3条回答

叛逆

2楼-- · 2019-08-13 01:34

You can use grep to find those rows whose names start with "BN".

Using x for the object instead of df (df is a function in R):

x[grep("^BN", row.names(x)),]
##     c1 c2 c3
## BN4  9  1  2
## BN8  4  6  8

0人赞添加讨论(0) 举报

霸刀☆藐视天下

3楼-- · 2019-08-13 01:42

You could use slice() from dplyr

library(dplyr)
df %>% slice(grep("^BN", row.names(.)))

Which gives:

#  c1 c2 c3
#1  9  1  2
#2  4  6  8

Here, row names are silently dropped. To preserve them, you can convert to an explicit variable by using add_rownames():

df %>% add_rownames() %>% slice(grep("^BN", rowname))

or using filter():

df %>% add_rownames() %>% filter(grepl("^BN", rowname))

You get:

#  rowname c1 c2 c3
#1     BN4  9  1  2
#2     BN8  4  6  8

0人赞添加讨论(0) 举报

The star\"

4楼-- · 2019-08-13 02:00

Try using regular expressions with grepl

df[grepl("BN\\d{1}", row.names(df), ]

If you would prefer functions that are a little more descriptive, you can do the same thing with the stringr package

df[str_detect(row.names(df), "BN\\d{1}"), ]

The catch is that these calls will pick up any rows that have a three character match of BN[digit] anywhere in the string. Something like XYBN9L would get picked up.

0人赞添加讨论(0) 举报

R subsetting by partially matching row name

采纳回答

编辑标签

举报内容

检举类型

检举原因

检举说明(必填)

打开微信“扫一扫”，打开网页后点击屏幕右上角分享按钮

付费偷看金额在0.1-10元之间