Regex - Extract a substring from a given string

I have a string here, This is a string: AAA123456789.

So the idea here is to extract the string AAA123456789 using regex.

I am incorporating this with X-Path.

Note: If there is a post to this, kindly lead me to it.

I think, by right, I should substring(myNode, [^AAA\d+{9}]),

I am not really sure bout the regex part.

The idea is to extract the string when met with "AAA" and only numbers but 9 consequent numbers only.

标签： regex xpath substring

4条回答

仙女界的扛把子

2楼-- · 2020-06-13 06:26

Pure XPath solution:

substring-after('This is a string: AAA123456789', ': ')

produces:

AAA123456789

XPath 2.0 solutions:

tokenize('This is a string: AAA123456789 but not an double',
              ' '
              )[starts-with(., 'AAA')]

or:

tokenize('This is a string: AAA123456789 but not an double',
              ' '
              )[matches(., 'AAA\d+')]

or:

replace('This is a string: AAA123456789 but not an double',
              '^.*(A+\d+).*$',
              '$1'
              )

0人赞添加讨论(0) 举报

我想做一个坏孩纸

3楼-- · 2020-06-13 06:33

First, I'm pretty sure you don't mean to have the [^ ... ]. That defines a "negative character class", i.e. your current regex says, "Give me a single character that is not one of the following: A0123456789{}". You probably meant, plainly, "AAA(\d{9})". Now, according to this handy website, XPath does support capture groups, as well as backreferences, so take your pick:

"AAA(\d{9})"

And extracting $1, the first capture group, or:

"(?<=AAA)\d{9}"

And taking the whole match ($0).

0人赞添加讨论(0) 举报

贼婆χ

4楼-- · 2020-06-13 06:40

Can you try this :

A{3}(\d{9})

0人赞添加讨论(0) 举报

男人必须洒脱

5楼-- · 2020-06-13 06:42

Alright, after referencing answers and comments by wonderful people here, I summarized my findings with this solution which I opted for. Here goes,

concat("AAA", substring(substring-after(., "AAA"), 1, 9)).

So I firstly, substring-after the string with "AAA" as the 1st argument, with the length of 1 to 9...anything more, is ignored. Then since I used the AAA as a reference, this will not appear, thus, concatenating AAA to the front of the value. So this means that I will get the 1st 9 digits after AAA and then concat AAA in front since its a static data.

This will allow the data to be correct no matter what other contributions there is.

But I like the regex by @Dimitre. The replace part. The tokenize not so as what if there isn't space as the argument. The replace with regex, this is also wonderful. Thanks.

And also thanks to you guys out there to...

0人赞添加讨论(0) 举报

Regex - Extract a substring from a given string

采纳回答

编辑标签

举报内容

检举类型

检举原因

检举说明(必填)

打开微信“扫一扫”，打开网页后点击屏幕右上角分享按钮

付费偷看金额在0.1-10元之间