How to convert a date string to timestamp in gawk?

2019-05-30 05:47发布

问题:

I am scanning through a log file formatted like this:

76.69.120.244 - - [09/Jun/2015:17:13:18 -0700] "GET /file.jpg HTTP/1.1" 200 22977 "http://example.com/" "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/43.0.2357.124 Safari/537.36" "16543" "ewr1" "0.002" "CA" "Bell Canada" "2"
76.69.120.244 - - [09/Jun/2015:17:13:19 -0700] "GET /differentfile.bin HTTP/1.1" 206 453684 "http://example.com/" "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/43.0.2357.124 Safari/537.36" "16543" "ewr1" "1.067" "CA" "Bell Canada" "2"

Inside gawk, I'm getting that request time using:

requesttime=$4;

What's the best way for me to parse that into a UTC/GMT based time, preferably an epoch timestamp?

I can at least always guarantee that it will be in -0700 if that helps; perhaps some kind of ugly string transformation to add those 7 hours on to it?

回答1:

This will do the main part of converting your date+time (it ignores the -0700) to a number of secs since the epoch for your current locale:

$ cat tst.awk
BEGIN { FS="[][]" }
{
    split($2,a,"[/: ]")
    match("JanFebMarAprMayJunJulAugSepOctNovDec",a[2])
    a[2] = sprintf("%02d",(RSTART+2)/3)
    secs = mktime(a[3]" "a[2]" "a[1]" "a[4]" "a[5]" "a[6])
    print $2, "->", secs
}

$ awk -f tst.awk file
09/Jun/2015:17:13:18 -0700 -> 1433887998
09/Jun/2015:17:13:19 -0700 -> 1433887999

and then you can either do some math on the secs or set the TZ variable appropriately before calling awk, e.g. (idk if this is the right TZ to use for your data/locale or not):

$ TZ=UTC awk -f tst.awk file
09/Jun/2015:17:13:18 -0700 -> 1433869998
09/Jun/2015:17:13:19 -0700 -> 1433869999

You can get your current locales time zone offset with strftime("%z"):

$ awk 'BEGIN{print strftime("%z")}'
-0500

so your final solution that includes the offset calculation might be or include (check the math as you didn't show what your expected output is and I might be misinterpreting what your data means to you!):

$ cat tst.awk
BEGIN {
    FS="[][]"
    locOffset = strftime("%z")
}
{
    split($2,a,"[/: ]")
    match("JanFebMarAprMayJunJulAugSepOctNovDec",a[2])
    a[2] = sprintf("%02d",(RSTART+2)/3)
    secs = mktime(a[3]" "a[2]" "a[1]" "a[4]" "a[5]" "a[6])
    secs = secs + (locOffset - a[7]) * 60 * 60
    print $2, "->", secs
}

$ awk -f tst.awk file
09/Jun/2015:17:13:18 -0700 -> 1434607998
09/Jun/2015:17:13:19 -0700 -> 1434607999

or if you like brevity and puzzles ( ;-) ):

$ cat tst.awk
BEGIN { FS="[][]" }
{
    split($2,a,"[/: ]")
    print $2, "->", mktime(a[3]" "(match("JanFebMarAprMayJunJulAugSepOctNovDec",a[2])+2)/3" "a[1]" "a[4]" "a[5]" "a[6]) + (strftime("%z") - a[7])*60*60
}

$ awk -f tst.awk file
09/Jun/2015:17:13:18 -0700 -> 1434607998
09/Jun/2015:17:13:19 -0700 -> 1434607999


回答2:

Here is another solution which uses the system call in gawk. Hope it would be helpful.

$ awk 'BEGIN{FS="[][]"}{system("date \"+%s\" -d \""gensub("/", " ", "G", gensub(":", " ", "1", $2))"\"")}' file 
1433895198
1433895199


标签: bash awk gawk