Extracting numbers from vectors of strings

2019-01-01 01:00发布

I have string like this:

years<-c("20 years old", "1 years old")

I would like to grep only the numeric number from this vector. Expected output is a vector:

c(20, 1)

How do I go about doing this?

标签: regex r
9条回答
一个人的天荒地老
2楼-- · 2019-01-01 01:24

Here's an alternative to Arun's first solution, with a simpler Perl-like regular expression:

as.numeric(gsub("[^\\d]+", "", years, perl=TRUE))
查看更多
柔情千种
3楼-- · 2019-01-01 01:27

Extract numbers from any string at beginning position.

x <- gregexpr("^[0-9]+", years)  # Numbers with any number of digits
x2 <- as.numeric(unlist(regmatches(years, x)))

Extract numbers from any string INDEPENDENT of position.

x <- gregexpr("[0-9]+", years)  # Numbers with any number of digits
x2 <- as.numeric(unlist(regmatches(years, x)))
查看更多
零度萤火
4楼-- · 2019-01-01 01:31

How about

# pattern is by finding a set of numbers in the start and capturing them
as.numeric(gsub("([0-9]+).*$", "\\1", years))

or

# pattern is to just remove _years_old
as.numeric(gsub(" years old", "", years))

or

# split by space, get the element in first index
as.numeric(sapply(strsplit(years, " "), "[[", 1))
查看更多
旧时光的记忆
5楼-- · 2019-01-01 01:33

After the post from Gabor Grothendieck post at the r-help mailing list

years<-c("20 years old", "1 years old")

library(gsubfn)
pat <- "[-+.e0-9]*\\d"
sapply(years, function(x) strapply(x, pat, as.numeric)[[1]])
查看更多
步步皆殇っ
6楼-- · 2019-01-01 01:37

Or simply:

as.numeric(gsub("\\D", "", years))
# [1] 20  1
查看更多
还给你的自由
7楼-- · 2019-01-01 01:37

You could get rid of all the letters too:

as.numeric(gsub("[[:alpha:]]", "", years))

Likely this is less generalizable though.

查看更多
登录 后发表回答