I am trying to do a filter in dplyr where a column is like certain observations. I can use sqldf as
Test <- sqldf("select * from database
Where SOURCE LIKE '%ALPHA%'
OR SOURCE LIKE '%BETA%'
OR SOURCE LIKE '%GAMMA%'")
I tried to use the following which doesn't return any results:
database %>% dplyr::filter(SOURCE %like% c('%ALPHA%', '%BETA%', '%GAMMA%'))
Thanks
You can use grepl
with ALPHA|BETA|GAMMA
, which will match if any of the three patterns is contained in SOURCE column.
database %>% filter(grepl('ALPHA|BETA|GAMMA', SOURCE))
If you want it to be case insensitive, add ignore.case = T
in grepl
.
%like%
is from the data.table
package. You're probably also seeing this warning message:
Warning message:
In grepl(pattern, vector) :
argument 'pattern' has length > 1 and only the first element will be used
The %like%
operator is just a wrapper around the grepl
function, which does string matching using regular expressions. So %
aren't necessary, and in fact they represent literal percent signs.
You can only supply one pattern to match at a time, so either combine them using the regex 'ALPHA|BETA|GAMMA'
(as Psidom suggests) or break the tests into three statements:
database %>%
dplyr::filter(
SOURCE %like% 'ALPHA' |
SOURCE %like% 'BETA' |
SOURCE %like% 'GAMMA'
)