I am trying to work out how to filter some observations from a large dataset using dplyr
and grepl
. I am not wedded to grepl
, if other solutions would be more optimal.
Take this sample df:
df1 <- data.frame(fruit=c("apple", "orange", "xapple", "xorange",
"applexx", "orangexx", "banxana", "appxxle"), group=c("A", "B") )
df1
# fruit group
#1 apple A
#2 orange B
#3 xapple A
#4 xorange B
#5 applexx A
#6 orangexx B
#7 banxana A
#8 appxxle B
I want to:
- filter out those cases beginning with 'x'
- filter out those cases ending with 'xx'
I have managed to work out how to get rid of everything that contains 'x' or 'xx', but not beginning with or ending with. Here is how to get rid of everything with 'xx' inside (not just ending with):
df1 %>% filter(!grepl("xx",fruit))
# fruit group
#1 apple A
#2 orange B
#3 xapple A
#4 xorange B
#5 banxana A
This obviously 'erroneously' (from my point of view) filtered 'appxxle'.
I have never fully got to grips with regular expressions. I've been trying to modify code such as: grepl("^(?!x).*$", df1$fruit, perl = TRUE)
to try and make it work within the filter command, but am not quite getting it.
Expected output:
# fruit group
#1 apple A
#2 orange B
#3 banxana A
#4 appxxle B
I'd like to do this inside dplyr
if possible.