awk - remove character in regex

2019-07-26 17:11发布

问题:

I want to remove 1 with awk from this regex: ^1[0-9]{10}$ if said regex is found in any field. I've been trying to make it work with sub or substr for a few hours now, I am unable to find the correct logic for this. I already have the solution for sed: s/^1\([0-9]\{10\}\)$/\1/, I need to make this work with awk.

Edit for input and output example. Input:

10987654321
2310987654321
1098765432123    

(awk twisted and overcomplicated syntax)

Output:

0987654321
2310987654321
1098765432123    

Basically the leading 1 needs to be removed only when it's followed by ten digits. The 2nd and 3rd example lines are correct, 2nd has 23 in front of 1, 3rd has a leading 1 but it's followed by 12 digits instead of ten. That's what the regex specifies.

回答1:

With sub(), you could try:

awk '/^1[0-9]{10}$/ { sub(/^1/, "") }1' file

Or with substr():

awk '/^1[0-9]{10}$/ { $0 = substr($0, 2) }1' file

If you need to test each field, try looping over them:

awk '{ for(i=1; i<=NF; i++) if ($i ~ /^1[0-9]{10}$/) sub(/^1/, "", $i) }1' file

https://www.gnu.org/software/gawk/manual/html_node/String-Functions.html



回答2:

if gnu awk is available for you, you could use gensub function:

echo '10987654321'|awk '{s=gensub(/^1([0-9]{10})$/,"\\1","g");print s}'
0987654321

edit:

do it for every field:

awk '{for(i=1;i<=NF;i++)$i=gensub(/^1([0-9]{10})$/,"\\1","g", $i)}7 file

test:

kent$  echo '10987654321 10987654321'|awk '{for(i=1;i<=NF;i++)$i=gensub(/^1([0-9]{10})$/,"\\1","g", $i)}7'                                                                  
0987654321 0987654321