Remove everything before the last space

2020-03-29 03:01发布

I have a following string. I tried to remove all the strings before the last space but it seems I can't achieve it.

I tried to follow this post

Use gsub remove all string before first white space in R

str <- c("Veni vidi vici")


gsub("\\s*","\\1",str)

"Venividivici"

What I want to have is only "vici" string left after removing everything before the last space.

1条回答
家丑人穷心不美
2楼-- · 2020-03-29 03:06

Your gsub("\\s*","\\1",str) code replaces each occurrence of 0 or more whitespaces with a reference to the capturing group #1 value (which is an empty string since you have not specified any capturing group in the pattern).

You want to match up to the last whitespace:

sub(".*\\s", "", str)

If you do not want to get a blank result in case your string has trailing whitespace, trim the string first:

sub(".*\\s", "", trimws(str))

Or, use a handy stri_extract_last_regex from stringi package with a simple \S+ pattern (matching 1 or more non-whitespace chars):

library(stringi)
stri_extract_last_regex(str, "\\S+")
# => [1] "vici"

Note that .* matches any 0+ chars as many as possible (since * is a greedy quantifier and . in a TRE pattern matches any char including line break chars), and grabs the whole string at first. Then, backtracking starts since the regex engine needs to match a whitespace with \s. Yielding character by character from the end of the string, the regex engine stumbles on the last whitespace and calls it a day returning the match that is removed afterwards.

See the R demo and a regex demo online:

str <- c("Veni vidi vici")
gsub(".*\\s", "", str)
## => [1] "vici"

Also, you may want to see how backtracking works in the regex debugger:

enter image description here

Those red arrows show backtracking steps.

查看更多
登录 后发表回答