I have a tab delimited file:
row.names c1 c2 c3
AF3 0 2 4
BN4 9 1 2
AF2 8 7 1
BN8 4 6 8
And I want to select only the rows with row names beginning with BN4, output would be like:
row.names c1 c2 c3
BN4 9 1 2
BN8 4 6 8
I know how I would solve the problem if I knew the exact row names in a vector...
df[row.names(df) %in% c('BN4','BN8'), ]
But how would I solve the problem by finding and subsetting on the rows that start with 'BN'?
You can use grep
to find those rows whose names start with "BN".
Using x
for the object instead of df
(df
is a function in R):
x[grep("^BN", row.names(x)),]
## c1 c2 c3
## BN4 9 1 2
## BN8 4 6 8
You could use slice()
from dplyr
library(dplyr)
df %>% slice(grep("^BN", row.names(.)))
Which gives:
# c1 c2 c3
#1 9 1 2
#2 4 6 8
Here, row names are silently dropped. To preserve them, you can convert to an explicit variable by using add_rownames()
:
df %>% add_rownames() %>% slice(grep("^BN", rowname))
or using filter()
:
df %>% add_rownames() %>% filter(grepl("^BN", rowname))
You get:
# rowname c1 c2 c3
#1 BN4 9 1 2
#2 BN8 4 6 8
Try using regular expressions with grepl
df[grepl("BN\\d{1}", row.names(df), ]
If you would prefer functions that are a little more descriptive, you can do the same thing with the stringr
package
df[str_detect(row.names(df), "BN\\d{1}"), ]
The catch is that these calls will pick up any rows that have a three character match of BN[digit] anywhere in the string. Something like XYBN9L would get picked up.