Sed using in logfiles

2019-07-11 00:41发布

问题:

The original logfile sample:

"GET /dynamic_preroll_playlist.fmil?domain=13nwuc&width=480&height=360&imu=medrect&pubchannel=filmannex&ad_unit=category_2&sdk_ver=2.4.1.3&embeddedIn=http%3A%2F%2Fwww.filmannex.com%2Fmovie%2Fend-of-the-tunnel%2F20872&sdk_url=http%3A%2F%2Fstatic2.filmannex.com%2Fflash%2F&viewport=10,261,971,0,971,0,10,261 HTTP/1.1", 200, 201, 1516, 16363, "http://static2.filmannex.com/flash/yume_ad_library.swf", pl.networks.com, "Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 6.0; FunWebProducts; GTB7.3; SLCC1; .NET CLR 2.0.50727; .NET CLR 3.5.30729; .NET CLR 3.0.30618; FunWebProducts; .NET4.0C)", "24_100_150_188_jZKFKQQjdRNM6e", "0rO0ABXd8AAAACgAAASQAAAaLAAAGiwAAASgAAAaLAAAGiwAAAVoAAAaLAAAGiwAAAVkAAAaKAAAGiwAAAdwAAAaKAAAGiwAAAhIAAAaKAAAGiwAAAhUAAAaKAAAGiwAAAhYAAAaKAAAGiwAAAhsAAAaKAAAGiwAAAiwAAAaKAAAGiw**", "-", "-", "@YD_1;233_2739", -, "-", "24.100.150.188", "199.127.205.6"

The required output is the 3rd and 4th field of viewport:

971 0

I used the command:

sed -n 's/.*viewport=\([^&]*\)/\1 /p' filename

get the wrong output : 10,261,971,0,971,0,10,261** HTTP/1.1", 200, 201, 1516, 16363, ..... too much redundant info following it.

Can anyone help me with this problem? Use the sed command fetch the 3rd and 4th parameter of viewport?

Thanks so much in advance :)

回答1:

Just use awk

gawk 'match($0, /&viewport=[0-9]+,[0-9]+,([0-9]+),([0-9]+)/, m){print m[1], m[2]}'

Note: Third argument to match is available only in gawk, so this script is gawk-specific. Explanation: we provide regex to match function, which captures third and fourth field in viewport. match returns non-zero value if provided regex can be successfully matched against some substring of whole record. Then it just prints captured groups.



回答2:

You stripped the right field, now feed the output to another tool:

sed ...... | awk -F, '{print $3, $4}'


回答3:

Or if you want to use grep and cut (hey, not everything has sed and awk):

grep -o "&viewport=[0-9,]*" filename | grep -o "[0-9,]*" | cut -d "," -f 3,4

Or you could use your previous command and pass that off to the same cut.

sed -n 's/.*viewport=\([^&]*\) /\1/p' sedtest | cut -d "," -f 3,4

Also, the reason it captures the rest of your text is because you substitute everything at the beginning with just the numbers, while leaving everything at the end. If you want to capture just the viewport parameters, you need substitute the entire string, not just the beginning. Also throw a space in the negated character set to stop after it.

sed -n 's/.*viewport=\([^& ]*\).*/\1/p' sedtest

With which you can do what I said before (though you don't need this latest addition).



回答4:

One way using grep, perl regex and awk in a pipe:

< file.txt grep -oP "viewport=[^ ]+" | awk -F "[=,]" '{ print $3, $4 }'

One way using awk:

awk -v RS="viewport=[^ ]+" 'RT != "" { split (RT,array,"[=,]"); print array[1 + 3], array[1 + 4] }' file.txt

EDIT:

In the awk only solution, I made it easier to select the viewport fields of interest. If you would like the 5th and 6th fields, simply change the array[1 + 3], array[1 + 4] to array[1 + 5], array[1 + 6]. Also, these solutions have the added advantage of finding multiple occurrences per line.



回答5:

another awk-only solution:

awk '{split($0,a,"viewport=");split(a[2],b,",");print b[3],b[4]}' filename

yields

971 0

This splits the input line using the string "viewport=" into an array named a, and takes the element of the array a that contains the data after "viewport=" and splits it into array b, and then prints out the elements we are interested in.