The original logfile sample:
"GET /dynamic_preroll_playlist.fmil?domain=13nwuc&width=480&height=360&imu=medrect&pubchannel=filmannex&ad_unit=category_2&sdk_ver=2.4.1.3&embeddedIn=http%3A%2F%2Fwww.filmannex.com%2Fmovie%2Fend-of-the-tunnel%2F20872&sdk_url=http%3A%2F%2Fstatic2.filmannex.com%2Fflash%2F&viewport=10,261,971,0,971,0,10,261 HTTP/1.1", 200, 201, 1516, 16363, "http://static2.filmannex.com/flash/yume_ad_library.swf", pl.networks.com, "Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 6.0; FunWebProducts; GTB7.3; SLCC1; .NET CLR 2.0.50727; .NET CLR 3.5.30729; .NET CLR 3.0.30618; FunWebProducts; .NET4.0C)", "24_100_150_188_jZKFKQQjdRNM6e", "0rO0ABXd8AAAACgAAASQAAAaLAAAGiwAAASgAAAaLAAAGiwAAAVoAAAaLAAAGiwAAAVkAAAaKAAAGiwAAAdwAAAaKAAAGiwAAAhIAAAaKAAAGiwAAAhUAAAaKAAAGiwAAAhYAAAaKAAAGiwAAAhsAAAaKAAAGiwAAAiwAAAaKAAAGiw**", "-", "-", "@YD_1;233_2739", -, "-", "24.100.150.188", "199.127.205.6"
The required output is the 3rd and 4th field of viewport:
971 0
I used the command:
sed -n 's/.*viewport=\([^&]*\)/\1 /p' filename
get the wrong output : 10,261,971,0,971,0,10,261** HTTP/1.1", 200, 201, 1516, 16363, .....
too much redundant info following it.
Can anyone help me with this problem? Use the sed command fetch the 3rd and 4th parameter of viewport?
Thanks so much in advance :)
Just use awk
gawk 'match($0, /&viewport=[0-9]+,[0-9]+,([0-9]+),([0-9]+)/, m){print m[1], m[2]}'
Note: Third argument to match
is available only in gawk
, so this script is gawk-specific.
Explanation: we provide regex to match
function, which captures third and fourth field in viewport
. match
returns non-zero value if provided regex can be successfully matched against some substring of whole record.
Then it just prints captured groups.
You stripped the right field, now feed the output to another tool:
sed ...... | awk -F, '{print $3, $4}'
Or if you want to use grep and cut (hey, not everything has sed and awk):
grep -o "&viewport=[0-9,]*" filename | grep -o "[0-9,]*" | cut -d "," -f 3,4
Or you could use your previous command and pass that off to the same cut.
sed -n 's/.*viewport=\([^&]*\) /\1/p' sedtest | cut -d "," -f 3,4
Also, the reason it captures the rest of your text is because you substitute everything at the beginning with just the numbers, while leaving everything at the end. If you want to capture just the viewport parameters, you need substitute the entire string, not just the beginning. Also throw a space in the negated character set to stop after it.
sed -n 's/.*viewport=\([^& ]*\).*/\1/p' sedtest
With which you can do what I said before (though you don't need this latest addition).
One way using grep
, perl
regex and awk
in a pipe:
< file.txt grep -oP "viewport=[^ ]+" | awk -F "[=,]" '{ print $3, $4 }'
One way using awk
:
awk -v RS="viewport=[^ ]+" 'RT != "" { split (RT,array,"[=,]"); print array[1 + 3], array[1 + 4] }' file.txt
EDIT:
In the awk
only solution, I made it easier to select the viewport fields of interest. If you would like the 5th and 6th fields, simply change the array[1 + 3], array[1 + 4]
to array[1 + 5], array[1 + 6]
. Also, these solutions have the added advantage of finding multiple occurrences per line.
another awk
-only solution:
awk '{split($0,a,"viewport=");split(a[2],b,",");print b[3],b[4]}' filename
yields
971 0
This splits the input line using the string "viewport="
into an array named a
, and takes the element of the array a
that contains the data after "viewport="
and splits it into array b
, and then prints out the elements we are interested in.