Ok guys I'm really at a dead end here, don't know what else to try...
I am writing a script for some e-mail statistics, one of the things it needs to do is calculate the complete size of all messages in the maillog, this is what I wrote so far:
egrep ' HOSTNAME sendmail\[.*.from=.*., size=' maillog | awk '{print $8}' |
tr "," "+" | tr -cd '[:digit:][=+=]' | sed 's/^/(/;s/+$/)\/1048576/' |
bc -ql | awk -F "." '{print $1}'
And here is a sample line from my maillog:
Nov 15 09:08:48 HOSTNAME sendmail[3226]: oAF88gWb003226:
from=<name.lastname@domain.com>, size=40992, class=0, nrcpts=24,
msgid=<E08A679A54DA4913B25ADC48CC31DD7F@domain.com>, proto=ESMTP,
daemon=MTA1, relay=[1.1.1.1]
So I'll try to explain it step by step:
First I grep through the file to find all the lines containing the actual "size", next i print the 8th field, in this case "size=40992,".
Next I replace all the comma characters with a plus sign.
Then I delete everything except the digits and the plus sign.
Then I replace the beginning of the line with a "(", and I replace the last extra plus sign with a ")" followed by "/1048576". So i get a huge expression looking like this:
"(1+2+3+4+5...+n)/1048576"
Because I want to add up all the individual message sizes and divide it so I get the result in MB.
The last awk command is when I get a decimal number I really don't care for precision so i just print the part before the decimal point.
The problem is, this doesn't work... And I could swear it was working at one point, could it be my expression is too long for bc to handle?
Thanks if you took the time to read through :)
I think a one-line
awk
script will work too. It matches any line that your egrep pattern matches, then for those lines it splits the eighth record by the = sign and adds the second part (the number) to the SUM variable. When it sees the END of the file it prints out the value of SUM/1048576 (or the byte count in Mibibytes).sed 's/^/(/;s/+$/)\/1048576\n/'
The final awk will happily eat all your output if the total size is less than 1MB and bc outputs something like .03333334234. If you are not interested in the decimal part remove that last awk command and the -l parameter from bc.
I'd do it with this one-liner:
grep ' HOSTNAME sendmail[[0-9][0-9]*]:..*:.*from=..*, size=' maillog | sed 's|.*, size=\([0-9][0-9]*\), .*|\1+|' | tr -d '\n' | sed 's|^|(|; s|$|0)/1048576\n|' | bc