I have a 371MB text file containing micro RNA data. Essentially, I would like to only select those rows that have information about human microRNA.
I have read in the file using a read.table. Usually, I'd accomplish what I'd want with sqldf - if it had a 'like' syntax (select * from <> where miRNA like 'hsa'). Unfortunately - sqldf does not support that syntax.
How can I do this in R? I have looked around stackoverflow and do not see an example of how I can do a partial string match. I even installed the stringr package - but it does not quite have what I need.
What I would like to do, is something like this - where all rows where hsa-* are selected.
selectedRows <- conservedData[, conservedData$miRNA %like% "hsa-"]
which of course, is not correct syntax.
Can somebody please help me with this? Thanks a lot for reading.
Asda
Try
str_detect()
from the stringr package, which detects the presence or absence of a pattern in a string.Here is an approach that also incorporates the
%>%
pipe andfilter()
from the dplyr package:This filters the sample CO2 data set (that comes with R) for rows where the Treatment variable contains the substring "non". You can adjust whether
str_detect
finds fixed matches or uses a regex - see the documentation for the stringr package.LIKE
should work in sqlite:I notice that you mention a function
%like%
in your current approach. I don't know if that's a reference to the%like%
from "data.table", but if it is, you can definitely use it as follows.Note that the object does not have to be a
data.table
(but also remember that subsetting approaches fordata.frame
s anddata.table
s are not identical):If that is what you had, then perhaps you had just mixed up row and column positions for subsetting data.
If you don't want to load a package, you can try using
grep()
to search for the string you're matching. Here's an example with themtcars
dataset, where we are matching all rows where the row names includes "Merc":And, another example, using the
iris
dataset searching for the stringosa
:For your problem try: