Circumvent the sed backreference limit \1 through

The sed manual clearly states that the available backreferences available for the replacement string in a substitute are numbered \1 through \9. I'm trying to parse a log file that has 10 fields.

I have the regex formed for it but the tenth match (and anything after) isn't accessible.

Does anyone have an elegant way to circumvent this limitation in KSH (or any language that perhaps I can port to shell scripting)?

标签： regex shell sed backreference

5条回答

Root（大扎）

2楼-- · 2019-01-14 13:16

Consider a solution that doesn't require the use of regular expression backreferences. For example, if you have a simple field delimiter, use split, or even use awk for your processing instead of perl.

0人赞添加讨论(0) 举报

\"骚年 ilove

3楼-- · 2019-01-14 13:24

Split the stream with -e, as long as the replaced elements are with in the group that you split them with. When I did a date split so I could re-org the date-time into a string of 14 digits, I had to split the stream up 3 times.

echo "created: 02/05/2013 16:14:49" |  sed -e 's/^\([[:alpha:]]*: \)//' -e 's/\([0-9]\{2\}\)\(\/\)\([0-9]\{2\}\)\(\/\)\([0-9]\{4\}\)\( \)/\5\1\3/' -e 's/\([0-9]\{2\}\)\(\:\)\([0-9]\{2\}\)\(\:\)\([0-9]\{2\}\)/\1\3\5/'

20130205161449

0人赞添加讨论(0) 举报

不美不萌又怎样

4楼-- · 2019-01-14 13:27

If you have GNU awk, You can do things with much more in control. For this you would be needing match(source,/regex/,array) construct.

Example:

Sample input for test:

 echo "$x"
p1=aaa,p2=bb,p3=cc,p4=dd,p5=ee,p6=ff,p7=gg,p8=hh,p9=ii,p10=jj

sed works fine till \9 :

echo $x |sed -r 's/p1=([^,]+).*p2=([^,]+).*p3=([^,]+).*p4=([^,]+).*p5=([^,]+).*p6=([^,]+).*p7=([^,]+).*p8=([^,]+).*p9=([^,]+)(.*)/\1 \2 \3 \4 \5 \6 \7 \8 \9/'
aaa bb cc dd ee ff gg hh ii

sed broke when \10 is added, it is considered is \1+0.

echo $x |sed -r 's/p1=([^,]+).*p2=([^,]+).*p3=([^,]+).*p4=([^,]+).*p5=([^,]+).*p6=([^,]+).*p7=([^,]+).*p8=([^,]+).*p9=([^,]+).*p10=([^,]+)(.*)/\1 \2 \3 \4 \5 \6 \7 \8 \9 \10/'
aaa bb cc dd ee ff gg hh ii aaa0

awk to rescue when any back reference added more than 9 is added. Here 10th refrence is added:

echo "$x" |awk '{match($0,/p1=([^,]+).*p2=([^,]+).*p3=([^,]+).*p4=([^,]+).*p5=([^,]+).*p6=([^,]+).*p7=([^,]+).*p8=([^,]+).*p9=([^,]+).*p10=([^,]+)(.*)/,a);print a[1],a[2],a[3],a[4],a[5],a[6],a[7],a[8],a[9],a[10]}'
aaa bb cc dd ee ff gg hh ii jj

0人赞添加讨论(0) 举报

Root（大扎）

5楼-- · 2019-01-14 13:32

Can you user perl -pe 's/(match)(str)/$2$1/g;' in place of sed? The way to circumvent the backreference limit is to use something other than sed.

Also, I suppose you could do your substitution in two steps, but I don't know your pattern so I can't help you out with how.

0人赞添加讨论(0) 举报

贪生不怕死

6楼-- · 2019-01-14 13:33

You're asking for a shell script solution - that means you're not limited to using just sed, correct? Most shells support arrays, so perhaps you can parse the line into a shell array variable? If need be, you could even parse the same line multiple times, extracting different bits of information on each pass.

Would that do?

0人赞添加讨论(0) 举报

Circumvent the sed backreference limit \1 through

采纳回答

编辑标签

举报内容

检举类型

检举原因

检举说明(必填)

打开微信“扫一扫”，打开网页后点击屏幕右上角分享按钮

付费偷看金额在0.1-10元之间