I have to parse some information out of big log file lines. Its something like
abc.log:2012-03-03 11:12:12,457 ABC[123.RPH.-101] XYZ: Query=get_data @a=0,@b=1 Rows=10Time=100
There are many log lines like above in the logfiles. I need to extract information like datetime i.e. 2012-03-03 11:12:12,457 job details i.e. 123.RPH.-101 Query i.e. get_data (no parameters) Rows i.e. 10 Time i.e. 100
So output should look like
2012-03-03 11:12:12,457|123|-101|get_data|10|100
I have tried various permutation computations with awk but not getting it right.
TXR:
Run:
Here is one way to make the program assert that every line in the log file must match the pattern. First, do not allow gaps in the collection. This means that nonmatching material cannot be skipped to just look for the lines which match:
Secondly, at the end of the script we add this:
This specifies a match on the end of file. If the
@(collect)
bails early because of a nonmatching line (due to the:gap 0
constraint), the@(eof)
will fail and so the script will terminate with a failed status.In this type of task, field splitting regex hacks will backfire because they can blindly produce incorrect results for some subset of the input being processed. If the input contains a vast number of lines, there is no easy way to check for mistakes. It's best to have a very specific match that is likely to reject anything which doesn't resemble the examples on which the pattern is based.
Just need the right field separators
I'm assuming the "abc.log:" is not actually in the log file.
Here's another, less fancy, AWK solution (but works in mawk too):
Nothe that this assumes the
Rows=10
andTime=100
strings are separated by space, that is, there was a typo in the question example.Well, this is really horrible, but since
sed
is in the tags and there are no answers yet...My solution in gawk: it uses gawk extension to match.
You didn't give specification of file format, so you may have to adjust the regexes.
Script invocation:
gawk -v OFS='|' -f script.awk