awk/sed/grep extract part of lines with specific p

2019-07-18 18:01发布

问题:

I'm trying to extract definite part of a file such as below:

1443113312 mongo client connection created with mongodb://172.28.128.5:27017
1443113312 [OVERALL], RunTime(ms), 4864.0
1443113313 [READ], Return=0, 485
1443113313 [CLEANUP], 99thPercentileLatency(us), 4487.0
1443113314 [UPDATE], 99thPercentileLatency(us), 27743.0

This is the output I'm expecting:

mongodb://172.28.128.5 Operations=OVERALL 1443113312
mongodb://172.28.128.5 Operations=READ    1443113313
mongodb://172.28.128.5 Operations=CLEANUP 1443113313
mongodb://172.28.128.5 Operations=UPDATE  1443113314

I really appreciate any suggestion. Thanks.

回答1:

$ awk -F'[][ \t:]+' '/mongodb/{a=$(NF-2)":"$(NF-1);next} a{printf "%s Operations=%-7s %s\n",a,$2,$1}' file
mongodb://172.28.128.5 Operations=OVERALL 1443113312
mongodb://172.28.128.5 Operations=READ    1443113313
mongodb://172.28.128.5 Operations=CLEANUP 1443113313
mongodb://172.28.128.5 Operations=UPDATE  1443113314

How it works

  • -F'[][ \t:]+'

    This sets the field separator to any combination of spaces, tabs, colons, or square brackets ([]).

  • /mongodb/{a=$(NF-2)":"$(NF-1);next}

    If the line contains mongodb, then we save the third and second to last fields in the variable a.

  • a{printf "%s Operations=%-7s %s\n",a,$2,$1}

    If the variable a has been assigned a value, then print out the current reformatted as per the question.

Variation

This produces the mongo string but not IP and puts the operation in parens:

$ awk -F'[][ \t:]+' '/mongodb/{a=$(NF-2);next} a{printf "%s\tOperations=\"%s\"\t%s\n",a,$2,$1}' file
mongodb Operations="OVERALL"    1443113312
mongodb Operations="READ"       1443113313
mongodb Operations="CLEANUP"    1443113313
mongodb Operations="UPDATE"     1443113314


回答2:

Perl to the rescue!

perl -nwe 'if (m=mongo client connection created with (mongodb://[0-9.]+)=) {
               $url = $1;
           } elsif (/^([0-9]+) \[([[:upper:]]+)\]/) {
               print "$url Operations=$2 $1\n";
           }' input-file

Explanation: -n reads the input line by line. Each time the "created" string is encountered, the URL is saved in the $url variable. Each time a number (timestamp?) plus upper case word in square brackets is encountered, the URL with the action and timestamp are printed.



回答3:

This might work for you (GNU sed & printf):

sed -rn '\|://|h;G;s/^(\S+) \[(\S+)\].* (\S+):.*/printf "%s Operations=%-7s %s" \3 \2 \1/ep' file

This use GNU sed's e flag which evaluates the pattern space. Alternatively the evaluation can be done in a separate process by piping the printf commands to a shell, so:

sed -rn '\|://|h;G;s/^(\S+) \[(\S+)\].* (\S+):.*/printf "%s Operations=%-7s %s\n" \3 \2 \1/p' | sh


标签: awk sed grep