awk extract multiple groups from each line

How do I perform action on all matching groups when the pattern matches multiple times in a line?

To illustrate, I want to search for /Hello! (\d+)/ and use the numbers, for example, print them out or sum them, so for input

abcHello! 200 300 Hello! Hello! 400z3
ads
Hello! 0

If I decided to print them out, I'd expect the output of

200
400
0

标签： regex awk grouping

4条回答

smile是对你的礼貌

2楼-- · 2019-01-15 14:21

This is gawk syntax. It also works for patterns when there's no fixed text that can work as a record separator and doesn't match over linefeeds:

 {
     pattern = "([a-g]+|[h-z]+)"
     while (match($0, pattern, arr))
     {
         val = arr[1]
         print val
         sub(pattern, "")
     }
 }

0人赞添加讨论(0) 举报

兄弟一词,经得起流年.

3楼-- · 2019-01-15 14:25

There is no gawk function to match the same pattern multiple times in a line. Unless you know exactly how many times the pattern repeats.

Having this, you have to iterate "manually" on all matches in the same line. For your example input, it would be:

{
  from = 0
  pos = match( $0, /Hello! ([0-9]+)/, val )
  while( 0 < pos )
  {
    print val[1]
    from += pos + val[0, "length"]
    pos = match( substr( $0, from ), /Hello! ([0-9]+)/, val )
  }
}

If the pattern shall match over a linefeed, you have to modify the input record separator - RS

0人赞添加讨论(0) 举报

叼着烟拽天下

4楼-- · 2019-01-15 14:33

This is a simple syntax, and every awk (nawk, mawk, gawk, etc) can use this.

{
    while (match($0, /Hello! [0-9]+/)) {
        pattern = substr($0, RSTART, RLENGTH);
        sub(/Hello! /, "", pattern);
        print pattern;
        $0 = substr($0, RSTART + RLENGTH);
    }
}

0人赞添加讨论(0) 举报

我欲成王，谁敢阻挡

5楼-- · 2019-01-15 14:34

GNU awk

awk 'BEGIN{ RS="Hello! ";}
{
    gsub(/[^0-9].*/,"",$1)
    if ($1 != ""){ 
        print $1 
    }
}' file

0人赞添加讨论(0) 举报

awk extract multiple groups from each line

采纳回答

编辑标签

举报内容

检举类型

检举原因

检举说明(必填)

打开微信“扫一扫”，打开网页后点击屏幕右上角分享按钮

付费偷看金额在0.1-10元之间