How to use AWK regExp to print multiple substring

2019-09-22 13:45发布

问题:

I have a log file which contains millions line like this:

$ cat file.log
10.0.7.92 - - [05/Jun/2017:03:50:06 +0000] "GET /adserver/html5/inwapads/?category=[IAB]&size=320x280&ak=AY1234&output=vast&version=1.1&sleepAfter=&requester=John&adFormat=preappvideo HTTP/1.1" 200 131 "-" "Mozilla/5.0 (Linux; Android 6.0.1; SM-S120VL Build/MMB29M; wv) AppleWebKit/537.36 (KHTML, like Gecko) Version/4.0 Chrome/58.0.3029.83 Mobile Safari/537.36" 0.000 1029 520 127.0.0.1
10.0.6.91 - - [05/Jun/2017:03:50:06 +0000] "GET /adserver/html5/inwapads/?category=[IAB]&output=vast&version=1.1&sleepAfter=&requester=John&size=320x280&ak=AY1234&adFormat=preappvideo HTTP/1.1" 200 131 "-" "Mozilla/5.0 (Linux; Android 6.0.1; SM-S120VL Build/MMB29M; wv) AppleWebKit/537.36 (KHTML, like Gecko) Version/4.0 Chrome/58.0.3029.83 Mobile Safari/537.36" 0.000 1029 520 127.0.0.1

I want print output of every line like this in excel with different columns:

inwapads    AY1234  john    320x280

How to do that use awk or do I need to use another method.

回答1:

If your desired Input looks like the file data:

$ cat file.log
10.0.7.92 - - [05/Jun/2017:03:50:06 +0000] "GET /adserver/html5/inwapads/?category=[IAB]&size=320x280&ak=AY1234&output=vast&version=1.1&sleepAfter=&requester=John&adFormat=preappvideo HTTP/1.1" 200 131 "-" "Mozilla/5.0 (Linux; Android 6.0.1; SM-S120VL Build/MMB29M; wv) AppleWebKit/537.36 (KHTML, like Gecko) Version/4.0 Chrome/58.0.3029.83 Mobile Safari/537.36" 0.000 1029 520 127.0.0.1
10.0.6.91 - - [05/Jun/2017:03:50:06 +0000] "GET /adserver/html5/inwapads/?category=[IAB]&output=vast&version=1.1&sleepAfter=&requester=John&size=320x280&ak=AY1234&adFormat=preappvideo HTTP/1.1" 200 131 "-" "Mozilla/5.0 (Linux; Android 6.0.1; SM-S120VL Build/MMB29M; wv) AppleWebKit/537.36 (KHTML, like Gecko) Version/4.0 Chrome/58.0.3029.83 Mobile Safari/537.36" 0.000 1029 520 127.0.0.1

Then you can simply use awk working on column $7 with some gensub( /regex/, substitution, n, column), awk's general substitution tool

$ awk '{
    item=gensub( /(^.*\/)(.*\/)(.*)(\/)(\?.*$)/ , "\\3" , 1, $7 )
    ak=gensub( /(^.*ak\=)([A-Z]*[0-9]*)(\&)(.*$)/ , "\\2" , 1, $7)
    req=gensub( /(^.*requester\=)([A-Za-z]*)(\&)(.*$)/ , "\\2", 1, $7)
    s=gensub( /(^.*size\=)([0-9]*x[0-9]*)(\&.*$)/, "\\2", 1,  $7)
    print item, ak, req, s
}' file.log

Output:

inwapads AY1234 John 320x280
inwapads AY1234 John 320x280