Replace single backslash in R

2020-01-23 05:34发布

问题:

I have a string that looks like:

str<-"a\f\r"

I'm trying to remove the backslashes but nothing works:

gsub("\","",str, fixed=TRUE)
gsub("\\","",str)
gsub("(\)","",str)
gsub("([\])","",str)

...basically all the variations you can imagine. I have even tried the string_replace_all function. ANY HELP??

I'm using R version 3.1.1; Mac OSX 10.7; the dput for a single string in my vector of strings gives:

dput(line)
"ud83d\ude21\ud83d\udd2b"

I imported the file using readLines from a standard .txt file. The content of the file looks something like: got an engineer booked for this afternoon \ud83d\udc4d all now hopefully sorted\ud83d\ude0a I m going to go insane ud83d\ude21\ud83d\udd2b in utf8towcs …

Thanks.

回答1:

When inputting backslashes from the keyboard, always escape them.

str <-"this\\is\\my\\string"    # note doubled backslashes -> 'this\is\my\string'
gsub("\\", "", str, fixed=TRUE) # ditto

str2 <- "a\\f\\r"               # ditto -> 'a\f\r'
gsub("\\", "", str2, fixed=TRUE)# ditto

Note that if you do

str <- "a\f\r"

then str contains no backslashes. It consists of the 3 characters a, \f (which is not normally printable, except as \f, and \r (same).

And just to head off a possible question. If your data was read from a file, the file doesn't have to have doubled backslashes. For example, if you have a file test.txt containing

a\b\c\d\e\f

and you do

str <- readLines("test.txt")

then str will contain the string a\b\c\d\e\f as you'd expect: 6 letters separated by 5 single backslashes. But you still have to type doubled backslashes if you want to work with it.

str <- gsub("\\", "", str, fixed=TRUE)  # now contains abcdef

From the dput, it looks like what you've got there is UTF-16 encoded text, which probably came from a Windows machine. According to

  • https://en.wikipedia.org/wiki/Unicode#Character_General_Category
  • https://en.wikipedia.org/wiki/UTF-16

it encodes glyphs in the Supplementary Multilingual Plane, which is pretty obscure. I'll guess that you need to supply the argument encoding="UTF-16" to readLines when you read in the file.



回答2:

One quite universal solution is

gsub("\\\\", "", str)

Thanks to the comment above.



回答3:

This might be helpful :)

require(stringi)
stri_escape_unicode("ala\\ma\\kota")
## [1] "ala\\\\ma\\\\kota"
stri_unescape_unicode("ala\\ ma\\ kota")
## [1] "ala ma kota"


回答4:

Since there isn't any direct ways to dealing with single backslashes, here's the closest solution to the problem as provided by David Arenburg in the comments section

gsub("[^A-Za-z0-9]", "", str) #remove all besides the alphabets & numbers


回答5:

This is the same as the accepted answer but rtemoves less (just non-ascii characters):

gsub("[^ -~]", '', "a\f\r") 
## [1] "a"