I have this sentence that contains "& / ?".
c = "Do Sam&Lilly like yes/no questions?"
I want to add a whitespace before and after each of the special characters to get
"Do Sam & Lilly like yes / no questions ? "
I can only get this by the hard way:
c = gsub("[&]", " & ", c)
c = gsub("[/]", " / ", c)
c = gsub("[?]", " ? ", c)
But imagine that I have many of these special character, which warrants using [:alnum:]. So I am really looking for a solution that looks like this:
gsub("[[:alnum:]]", " [[:alnum:]] ", c)
Unfortunately, I cannot use [:alnum:] as the second argument this way.
You can use a capture group reference:
gsub("([&/])", " \\1 ", c)
Here we replace "&"
or "/"
with themselves ("\\1"
) padded with spaces. The "\\1"
means "use the first matched group from the pattern. A matched group is a portion of a regular expression in parentheses. In our case, the "([&/])"
.
You can expand this to cover more symbols / special characters by adding them to the character set, or by putting in an appropriate regex special character.
note: you probably shouldn't use c
as a variable name since it is also the name of a very commonly used function.
Seems like you mean this,
> c <- "Do Sam&Lilly like yes/no questions?"
> gsub("([^[:alnum:][:blank:]])", " \\1 ", c)
[1] "Do Sam & Lilly like yes / no questions ? "
[^[:alnum:][:blank:]]
negated POSIX character class which matches any character but not of an alphanumeric or horizontal space character. BY putting the pattern inside a capturing group, it would capture all the special characters. Replacing the matched special chars with space
+\\1
(refers the characters which are present inside the first group) + space
will give you the desired output. You could use [:space:]
instead of [:blank:]
also.
You can build your regex patterns outside of gsub
and then pass them in. I see the BrodieG refreed to the pattern enclosed in "(...)"
as a "capture group". The material inside square-brackets, "[...]"
are called "character classes" in the R-help page for ?regex
. The "\1" is a "back-reference" and since the regex-help page seems to be silent on the matter of what to call strings enclosed in parentheses, I've probably just been pushed a bit further along in my understanding of regex terminology. :
your_chars <- c("!@#$%^&*", "()_+", "?/")
patt <- paste0( "([", paste0(your_chars,collapse=""), "])", collapse="")
gsub(patt, " \\1 ", ct)
#[1] "Do Sam & Lilly like yes / no questions ? "
You would need to use gsub
rather than sub
if you want to replace more than one instance ins a character value.