I have a script that read log files and parse the data to insert them to mysql table..
My script looks like
while read x;do
var=$(echo ${x}|cut -d+ -f1)
var2=$(echo ${x}|cut -d_ -f3)
...
echo "$var,$var2,.." >> mysql.infile
done<logfile
The Problem is that log files are thousands of lines and taking hours....
I read that awk
is better, I tried, but don't know the syntax to parse the variables...
EDIT:
inputs are structure firewall logs so they are pretty large files like
@timestamp $HOST reason="idle Timeout" source-address="x.x.x.x"
source-port="19219" destination-address="x.x.x.x"
destination-port="53" service-name="dns-udp" application="DNS"....
So I'm using a lot of grep
for ~60 variables e.g
sourceaddress=$(echo ${x}|grep -P -o '.{0,0}
source-address=\".{0,50}'|cut -d\" -f2)
if you think perl
will be better I'm open to suggestions and maybe a hint how to script it...
To answer your question, I assume the following rules of the game:
- each line contains various variables
- each variable can be found by a different delimiter.
This gives you the following awk script :
awk 'BEGIN{OFS=","}
{ FS="+"; $0=$0; var=$1;
FS="_"; $0=$0; var2=$3;
...
print var1,var2,... >> "mysql.infile"
}' logfile
It basically does the following :
- set the output separator to
,
- read line
- set the field separator to
+
, re-parse the line ($0=$0
) and determine the first variable
- set the field separator to '_', re-parse the line (
$0=$0
) and determine the second variable
- ... continue for all variables
- print the line to the output file.
The perl script below might help:
perl -ane '/^[^+]*/;printf "%s,",$&;/^([^_]*_){2}([^_]*){1ntf "%s\n",$+' logfile
Since, $&
can result in performance penalty, you could also use the /p
modifier like below :
perl -ane '/^[^+]*/p;printf "%s,",${^MATCH};/^([^_]*_){2}([^_]*){1}_.*/;printf "%s\n",$+' logfile
For more on perl
regex matching refer to [ PerlDoc ]
if you're extracting the values in order, something like this will help
$ awk -F\" '{for(i=2;i<=NF;i+=2) print $i}' file
idle Timeout
x.x.x.x
19219
x.x.x.x
53
dns-udp
DNS
you can easily change the output format as well
$ awk -F\" -v OFS=, '{for(i=2;i<=NF;i+=2)
printf "%s", $i ((i>NF-2)?ORS:OFS)}' file
idle Timeout,x.x.x.x,19219,x.x.x.x,53,dns-udp,DNS