How to replace many special characters with “somet

2019-01-28 12:59发布

问题:

I have this sentence that contains "& / ?".

c = "Do Sam&Lilly like yes/no questions?"

I want to add a whitespace before and after each of the special characters to get

"Do Sam & Lilly like yes / no questions ? "

I can only get this by the hard way:

c = gsub("[&]", " & ", c)
c = gsub("[/]", " / ", c)
c = gsub("[?]", " ? ", c)

But imagine that I have many of these special character, which warrants using [:alnum:]. So I am really looking for a solution that looks like this:

gsub("[[:alnum:]]", " [[:alnum:]] ", c)

Unfortunately, I cannot use [:alnum:] as the second argument this way.

回答1:

You can use a capture group reference:

gsub("([&/])", " \\1 ", c)

Here we replace "&" or "/" with themselves ("\\1") padded with spaces. The "\\1" means "use the first matched group from the pattern. A matched group is a portion of a regular expression in parentheses. In our case, the "([&/])".

You can expand this to cover more symbols / special characters by adding them to the character set, or by putting in an appropriate regex special character.

note: you probably shouldn't use c as a variable name since it is also the name of a very commonly used function.



回答2:

Seems like you mean this,

> c <- "Do Sam&Lilly like yes/no questions?"
> gsub("([^[:alnum:][:blank:]])", " \\1 ", c)
[1] "Do Sam & Lilly like yes / no questions ? "

[^[:alnum:][:blank:]] negated POSIX character class which matches any character but not of an alphanumeric or horizontal space character. BY putting the pattern inside a capturing group, it would capture all the special characters. Replacing the matched special chars with space+\\1 (refers the characters which are present inside the first group) + space will give you the desired output. You could use [:space:] instead of [:blank:] also.



回答3:

You can build your regex patterns outside of gsub and then pass them in. I see the BrodieG refreed to the pattern enclosed in "(...)"as a "capture group". The material inside square-brackets, "[...]" are called "character classes" in the R-help page for ?regex. The "\1" is a "back-reference" and since the regex-help page seems to be silent on the matter of what to call strings enclosed in parentheses, I've probably just been pushed a bit further along in my understanding of regex terminology. :

your_chars <- c("!@#$%^&*", "()_+", "?/")
patt <- paste0( "([", paste0(your_chars,collapse=""), "])", collapse="")
gsub(patt, " \\1 ", ct)
#[1] "Do Sam & Lilly like yes / no questions ? "

You would need to use gsub rather than sub if you want to replace more than one instance ins a character value.