str_extract specific patterns (example)

I'm still a little confused by regex syntax. Can you please help me with these patterns:

_A00_A1234B_
_A00_A12345B_
_A1_A12345_

my approaches so far:

vapply(strsplit(files, "[_.]"), function(files) files[nchar(files) == 7][1], character(1))

str_extract(str2, "[A-Z][0-9]{5}[A-Z]")

The expected outputs are

A1234B
A12345B
A12345

Thanks!

标签： regex r

4条回答

够拽才男人

2楼-- · 2020-03-04 07:13

You can do this without using a regular expression ...

x <- c('_A00_A1234B_', '_A00_A12345B_', '_A1_A12345_')
sapply(strsplit(x, '_', fixed=T), '[', 3)
# [1] "A1234B"  "A12345B" "A12345"

If you insist on using a regular expression, the following will suffice.

regmatches(x, regexpr('[^_]+(?=_$)', x, perl=T))

0人赞添加讨论(0) 举报

何必那么认真

3楼-- · 2020-03-04 07:16

Using rex to construct the regular expression may make it more understandable.

x <- c("_A00_A1234B_", "_A00_A12345B_", "_A1_A12345_")

# approach #1, assumes always is between the second underscores.
re_matches(x,
  rex(
    "_",
    anything,
    "_",
    capture(anything),
    "_"
  )
)

#>         1
#> 1  A1234B
#> 2 A12345B
#> 3  A12345


# approach #2, assumes an alpha, followed by 4 or 5 digits with a possible trailing alpha.
re_matches(x,
  rex(
    capture(
      alpha,
      between(digit, 4, 5),
      maybe(alpha)
    )
  )
)

#>         1
#> 1  A1234B
#> 2 A12345B
#> 3  A12345

0人赞添加讨论(0) 举报

狗以群分

4楼-- · 2020-03-04 07:24

vec <- c("_A00_A1234B_", "_A00_A12345B_", "_A1_A12345_")

You can use sub and this regex:

sub(".*([A-Z]\\d{4,5}[A-Z]?).*", "\\1", vec)
# [1] "A1234B"  "A12345B" "A12345"

0人赞添加讨论(0) 举报

贼婆χ

5楼-- · 2020-03-04 07:25

You can try

library(stringr)
str_extract(str2, "[A-Z][0-9]{4,5}[A-Z]?")
#[1] "A1234B"  "A12345B" "A12345"

Here, the pattern looks for a capital letter [A-Z], followed by 4 or 5 digits [0-9]{4,5}, followed by a capital letter [A-Z] ?

Or you can use stringi which would be faster

library(stringi)
 stri_extract(str2, regex="[A-Z][0-9]{4,5}[A-Z]?")
 #[1] "A1234B"  "A12345B" "A12345"

Or a base R option would be

 regmatches(str2,regexpr('[A-Z][0-9]{4,5}[A-Z]?', str2))
 #[1] "A1234B"  "A12345B" "A12345"

data

str2 <- c('_A00_A1234B_', '_A00_A12345B_', '_A1_A12345_')

0人赞添加讨论(0) 举报

str_extract specific patterns (example)

data

采纳回答

编辑标签

举报内容

检举类型

检举原因

检举说明(必填)

打开微信“扫一扫”，打开网页后点击屏幕右上角分享按钮

付费偷看金额在0.1-10元之间