I am currently working on a program that takes a .pcap file and separates all of the packets out by ip using the scapy package. I want to decompress the payloads that are compressed using the gzip package. I can tell if the payload is gzipped because it contains
Content-Encoding: gzip
I am trying to use
fileStream = StringIO.StringIO(payload)
gzipper = gzip.GzipFile(fileobj=fileStream)
data = gzipper.read()
to decompress the payload, where
payload = str(pkt[TCP].payload)
When I try to do this I get the error
IOError: Not a gzipped file
When I print the first payload I get
HTTP/1.1 200 OK
Cache-Control: private, max-age=0
Content-Type: text/html; charset=utf-8
P3P: CP="NON UNI COM NAV STA LOC CURa DEVa PSAa PSDa OUR IND"
Vary: Accept-Encoding
Content-Encoding: gzip
Date: Sat, 30 Mar 2013 19:23:33 GMT
Content-Length: 15534
Connection: keep-alive
Set-Cookie: _FS=NU=1; domain=.bing.com; path=/
Set-Cookie: _SS=SID=F2652FD33DC443498CE043186458C3FC&C=20.0; domain=.bing.com; path=/
Set-Cookie: MUID=2961778241736E4F314E732240626EBE; expires=Mon, 30-Mar-2015 19:23:33 GMT; domain=.bing.com; path=/
Set-Cookie: MUIDB=2961778241736E4F314E732240626EBE; expires=Mon, 30-Mar-2015 19:23:33 GMT; path=/
Set-Cookie: OrigMUID=2961778241736E4F314E732240626EBE%2c532012b954b64747ae9b83e7ede66522; expires=Mon, 30-Mar-2015 19:23:33 GMT; domain=.bing.com; path=/
Set-Cookie: SRCHD=D=2758763&MS=2758763&AF=NOFORM; expires=Mon, 30-Mar-2015 19:23:33 GMT; domain=.bing.com; path=/
Set-Cookie: SRCHUID=V=2&GUID=02F43275DC7F435BB3DF3FD32E181F4D; expires=Mon, 30-Mar-2015 19:23:33 GMT; path=/
Set-Cookie: SRCHUSR=AUTOREDIR=0&GEOVAR=&DOB=20130330; expires=Mon, 30-Mar-2015 19:23:33 GMT; domain=.bing.com; path=/
?}k{?H????+0?#!?,_???$?:?7vf?w?Hb???ƊG???9???/9U?\$;3{9g?ycAӗ???????W{?o?~?FZ?e ]>??<??n?????????????d?t??a?3?
?2?p??eBI?e??????ܒ?P??-?Q?-L?????ǼR?³?ׯ??%'
?2Kf?7???c?Y?I?1+c??,ae]?????<{?=ƞ,?^?J?ď???y??6O?_?z????_?ޞ~?_?????Bo%]???_?????W=?
For additional information, this is a packet that was isolated because it contained Content-Encoding: gzip from a sample .pcap file provided by a project.
In order to decode a gzipped HTTP response, you only need to decode the response body, not the headers.
The payload
in your case is the entire TCP payload, i.e. the entire HTTP message including headers and body.
HTTP messages (requests and responses) are RFC 822 messages (which is the same generic message format that E-Mail messages (RFC 2822) are based upon).
The structure of an 822 message is very simple:
- Zero or more header lines (key/ value pairs separated by
:
), terminated by CRLF
- An empty line (CRLF (carriage return, line feed, so
'\r\n'
)
- The message body
You now could parse this message yourself in order to isolate the body. But I would rather recommend you use the tools Python already provides for you. The httplib
module (Python 2.x) includes the HTTPMessage
class which is used by httplib
internally to parse HTTP responses. It's not meant to be used directly, but in this case I would probably still use it - it will handle some HTTP specific details for you.
Here's how you can use it to separate the body from the headers:
>>> from httplib import HTTPMessage
>>>
>>> f = open('gzipped_response.payload')
>>>
>>> # Or, if you already have the payload in memory as a string:
... # f = StringIO.StringIO(payload)
...
>>> status_line = f.readline()
>>> msg = HTTPMessage(f, 0)
>>> body = msg.fp.read()
The HTTPMessage
class works in a similar way the rfc822.Message
does:
First, you need to read (or discard) the status line (HTTP/1.1 200 OK
), because that's not part of the RFC822 message, and is not a header.
Then you instantiate HTTPMessage
with a handle to an open file and the seekable
argument set to 0
. The file pointer is stored as msg.fp
- Upon instantiation it calls
msg.readheaders()
, which reads all header lines until it encounters an empty line (CRLF).
- At that point,
msg.fp
has been advanced to the point where the headers end and the body starts. You can therefore call msg.fp.read()
to read the rest of the message - the body.
After that, your code for decompressing the gzipped body just works:
>>> body_stream = StringIO.StringIO(body)
>>> gzipper = gzip.GzipFile(fileobj=body_stream)
>>> data = gzipper.read()
>>>
>>> print data[:25]
<!DOCTYPE html>
<html>