I've got the following vector:
words <- c("5lang","kasverschil2","b2b")
I want to remove "5"
in "5lang"
and "2"
in "kasverschil2"
. But I do NOT want to remove "2"
in "b2b"
.
I've got the following vector:
words <- c("5lang","kasverschil2","b2b")
I want to remove "5"
in "5lang"
and "2"
in "kasverschil2"
. But I do NOT want to remove "2"
in "b2b"
.
gsub("^\\d+|\\d+$", "", words)
#[1] "lang" "kasverschil" "b2b"
Another option would be to use stringi
library(stringi)
stri_replace_all_regex(words, "^\\d+|\\d+$", "")
#[1] "lang" "kasverschil" "b2b"
Using a variant of the data set provided by the OP here are benchmarks for 3 three main solutions (note that these strings are very short and contrived; results may differ on a larger, real data set):
words <- rep(c("5lang","kasverschil2","b2b"), 100000)
library(stringi)
library(microbenchmark)
GSUB <- function() gsub("^\\d+|\\d+$", "", words)
STRINGI <- function() stri_replace_all_regex(words, "^\\d+|\\d+$", "")
GREGEXPR <- function() {
gregexpr(pattern='(^[0-9]+|[0-9]+$)', text = words) -> mm
sapply(regmatches(words, mm, invert=TRUE), paste, collapse="")
}
microbenchmark(
GSUB(),
STRINGI(),
GREGEXPR(),
times=100L
)
## Unit: milliseconds
## expr min lq median uq max neval
## GSUB() 301.0988 349.9952 396.3647 431.6493 632.7568 100
## STRINGI() 465.9099 513.1570 569.1972 629.4176 738.4414 100
## GREGEXPR() 5073.1960 5706.8160 6194.1070 6742.1552 7647.8904 100
Get instances where numbers appear at the beginning or end of a word and match everything else. You need to collapse results because of possible multiple matches:
gregexpr(pattern='(^[0-9]+|[0-9]+$)', text = words) -> mm
sapply(regmatches(words, mm, invert=TRUE), paste, collapse="")
You can use gsub
which uses regular expressions:
gsub("^[0-9]|[0-9]$", "", words)
# [1] "lang" "kasverschil" "b2b"
Explanation:
The pattern ^[0-9]
matches any number at the beginning of a string, while the pattern [0-9]$
matches any number at the end of the string. by separating these two patterns by |
you want to match either the first or the second pattern. Then, you replace the matched pattern with an empty string.