How to remove '\' from a string in sparkly

I am using sparklyr and have a spark dataframe with a column wordthat contains words, some of which contain special characters which I want to remove. I was succesful in using regepx_replace and \\\\ before special characters, just like this:

words.sdf <- words.sdf %>% 
  mutate(word = regexp_replace(word, '\\\\(', '')) %>% 
  mutate(word = regexp_replace(word, '\\\\)', '')) %>% 
  mutate(word = regexp_replace(word, '\\\\+', '')) %>% 
  mutate(word = regexp_replace(word, '\\\\?', '')) %>%
  mutate(word = regexp_replace(word, '\\\\:', '')) %>%
  mutate(word = regexp_replace(word, '\\\\;', '')) %>%
  mutate(word = regexp_replace(word, '\\\\!', ''))

Now I want to remove \. I have tried both :

words.sdf <- words.sdf %>% 
  mutate(word = regexp_replace(word, '\\\\\', ''))

and :

words.sdf <- words.sdf %>% 
  mutate(word = regexp_replace(word, '\', ''))

But neither will work...

标签： r apache-spark text sparklyr

1条回答

爷、活的狠高调

2楼-- · 2019-07-26 13:19

You have to correct your code for both R-side and Java side escaping so what you need is actually "\\\\\\\\":

df <- copy_to(sc, tibble(word = "(abc\\zyx: 1)"))

df %>% mutate(regexp_replace(word, "\\\\\\\\", ""))

# Source:   lazy query [?? x 2]
# Database: spark_shell_connection
  word           `regexp_replace(word, "\\\\\\\\\\\\\\\\", "")`
  <chr>          <chr>                                         
1 "(abc\\zyx:1)" (abczyx: 1)

Depending on your exact requirement it might be easier to match all characters at once. You could for example preserve only word characters (\w) and whitespaces (\s):

df %>% mutate(regexp_replace(word, "[^\\\\w+\\\\s+]", ""))

# Source:   lazy query [?? x 2]
# Database: spark_shell_connection
  word            `regexp_replace(word, "[^\\\\\\\\w+\\\\\\\\s+]", "")`
  <chr>           <chr>                                                
1 "(abc\\zyx: 1)" abczyx 1

or word characters only

df %>% mutate(regexp_replace(word, "[^\\\\w+]", ""))

# Source:   lazy query [?? x 2]
# Database: spark_shell_connection
  word            `regexp_replace(word, "[^\\\\\\\\w+]", "")`
  <chr>           <chr>                                      
1 "(abc\\zyx: 1)" abczyx1

0人赞添加讨论(0) 举报

How to remove '\' from a string in sparkly

采纳回答

编辑标签

举报内容

检举类型

检举原因

检举说明(必填)

打开微信“扫一扫”，打开网页后点击屏幕右上角分享按钮

付费偷看金额在0.1-10元之间