I am using sparklyr
and have a spark dataframe with a column word
that contains words, some of which contain special characters which I want to remove. I was succesful in using regepx_replace
and \\\\
before special characters, just like this:
words.sdf <- words.sdf %>%
mutate(word = regexp_replace(word, '\\\\(', '')) %>%
mutate(word = regexp_replace(word, '\\\\)', '')) %>%
mutate(word = regexp_replace(word, '\\\\+', '')) %>%
mutate(word = regexp_replace(word, '\\\\?', '')) %>%
mutate(word = regexp_replace(word, '\\\\:', '')) %>%
mutate(word = regexp_replace(word, '\\\\;', '')) %>%
mutate(word = regexp_replace(word, '\\\\!', ''))
Now I want to remove \
. I have tried both :
words.sdf <- words.sdf %>%
mutate(word = regexp_replace(word, '\\\\\', ''))
and :
words.sdf <- words.sdf %>%
mutate(word = regexp_replace(word, '\', ''))
But neither will work...
You have to correct your code for both R-side and Java side escaping so what you need is actually
"\\\\\\\\"
:Depending on your exact requirement it might be easier to match all characters at once. You could for example preserve only word characters (
\w
) and whitespaces (\s
):or word characters only