Regex to match logfile custom date formats

2019-08-17 16:48发布

I'm trying to parse lines between a date range in a file. However dates are formatted in a non standard way. Is it possible for a regex to match these? The log file is formatted like so:

Jan  5 11:34:00 log messages here
Jan 13 16:21:00 log messages here
Feb  1 01:14:00 log messages here
Feb 10 16:32:00 more messages
Mar  7 16:32:00 more messages
Apr 21 16:32:00 more messages

For example if I want to match lines between January 1st and Feb 10th, Ive been unable to get regex to match the month order since they arent numerical.

1条回答
贼婆χ
2楼-- · 2019-08-17 17:24

The following shell line, might do the trick. Assume you want to see the first 41 days after January '2nd', then you can do

pipeline of echo, date and grep:

echo {0..41} \
  | xargs -I{} -d ' ' date -d "2018-01-02 + {} days" +"%b %e" \
  | grep -F -f - <logfile>

I believe this is the quickest. The idea is to build a set of possible days (these are the first two lines), and then search for them with grep.

sorted log-file with awk:

When processing sorted log-files you can use quick-returns to limit yourself to processing the only-needed fractions.

awk -v tstart="Jan  1" -v tend="Feb 10" '
   BEGIN{ month["Jan"]=1; month["Feb"]=2; month["Mar"]=3
          month["Arp"]=4; month["May"]=5; month["Jun"]=6
          month["Jul"]=7; month["Aug"]=8; month["Sep"]=9
          month["Oct"]=10;month["Nov"]=11;month["Dec"]=12
          $0=tstart; ms=$1; ds=$2
          $0=tend  ; me=$1; de=$2
         }
  (month[$1]<month[ms])             { next }
  (month[$1]==month[ms]) && ($2<ds) { next }
  (month[$1]==month[me]) && ($2>de) { exit }
  (month[$1]>month[me])             { exit }
  1' <logfile>

unsorted log-file with awk :

When processing unsorted log-files, you need to do the comparisons actively. This obviously takes much more time.

awk -v tstart="Jan  1" -v tend="Feb 10" '
   BEGIN{ month["Jan"]=1; month["Feb"]=2; month["Mar"]=3
          month["Arp"]=4; month["May"]=5; month["Jun"]=6
          month["Jul"]=7; month["Aug"]=8; month["Sep"]=9
          month["Oct"]=10;month["Nov"]=11;month["Dec"]=12
          $0=tstart; ms=$1; ds=$2
          $0=tend  ; me=$1; de=$2
         }
   (ms == me) && ($1 == ms) && (ds<=$2) && ($2<=de) { print; next }
   ($1 == ms) && (ds<=$2)                           { print; next }
   ($1 == me) && ($2<=de)                           { print; next }
   (month[ms]<month[$1]) && (month[$1]<month[me])` <logfile>

The above commands both return :

Jan  5 11:34:00 log messages here
Jan 13 16:21:00 log messages here
Feb  1 01:14:00 log messages here
Feb 10 16:32:00 more messages

note: date-ranges that cross the 31st of December might give bogus results.

查看更多
登录 后发表回答