Oneliner to calculate complete size of all message

2019-08-03 13:22发布

问题:

Ok guys I'm really at a dead end here, don't know what else to try...

I am writing a script for some e-mail statistics, one of the things it needs to do is calculate the complete size of all messages in the maillog, this is what I wrote so far:

egrep ' HOSTNAME sendmail\[.*.from=.*., size=' maillog | awk '{print $8}' |  
tr "," "+" | tr -cd '[:digit:][=+=]' | sed 's/^/(/;s/+$/)\/1048576/' |  
bc -ql | awk -F "." '{print $1}'

And here is a sample line from my maillog:

Nov 15 09:08:48 HOSTNAME sendmail[3226]: oAF88gWb003226:  
from=<name.lastname@domain.com>, size=40992, class=0, nrcpts=24,  
msgid=<E08A679A54DA4913B25ADC48CC31DD7F@domain.com>, proto=ESMTP,  
daemon=MTA1, relay=[1.1.1.1]

So I'll try to explain it step by step:

First I grep through the file to find all the lines containing the actual "size", next i print the 8th field, in this case "size=40992,".

Next I replace all the comma characters with a plus sign.

Then I delete everything except the digits and the plus sign.

Then I replace the beginning of the line with a "(", and I replace the last extra plus sign with a ")" followed by "/1048576". So i get a huge expression looking like this:

"(1+2+3+4+5...+n)/1048576"

Because I want to add up all the individual message sizes and divide it so I get the result in MB.

The last awk command is when I get a decimal number I really don't care for precision so i just print the part before the decimal point.

The problem is, this doesn't work... And I could swear it was working at one point, could it be my expression is too long for bc to handle?

Thanks if you took the time to read through :)

回答1:

I think a one-line awk script will work too. It matches any line that your egrep pattern matches, then for those lines it splits the eighth record by the = sign and adds the second part (the number) to the SUM variable. When it sees the END of the file it prints out the value of SUM/1048576 (or the byte count in Mibibytes).

awk '/ HOSTNAME sendmail\[.*.from=.*., size=/{ split($8,a,"=") ; SUM += a[2] } END { print SUM/1048576 }' maillog


回答2:

  • bc chokes if there is no newline in its input, as happens with your expression. You have to change the sed part to:

sed 's/^/(/;s/+$/)\/1048576\n/'

  • The final awk will happily eat all your output if the total size is less than 1MB and bc outputs something like .03333334234. If you are not interested in the decimal part remove that last awk command and the -l parameter from bc.

  • I'd do it with this one-liner:

grep ' HOSTNAME sendmail[[0-9][0-9]*]:..*:.*from=..*, size=' maillog | sed 's|.*, size=\([0-9][0-9]*\), .*|\1+|' | tr -d '\n' | sed 's|^|(|; s|$|0)/1048576\n|' | bc



标签: bash sed awk grep bc