Decompressing a gzipped payload of a packet with P

I am currently working on a program that takes a .pcap file and separates all of the packets out by ip using the scapy package. I want to decompress the payloads that are compressed using the gzip package. I can tell if the payload is gzipped because it contains

Content-Encoding: gzip

I am trying to use

fileStream = StringIO.StringIO(payload)
gzipper = gzip.GzipFile(fileobj=fileStream)
data = gzipper.read()

to decompress the payload, where

payload = str(pkt[TCP].payload)

When I try to do this I get the error

IOError: Not a gzipped file

When I print the first payload I get

HTTP/1.1 200 OK
Cache-Control: private, max-age=0
Content-Type: text/html; charset=utf-8
P3P: CP="NON UNI COM NAV STA LOC CURa DEVa PSAa PSDa OUR IND"
Vary: Accept-Encoding
Content-Encoding: gzip
Date: Sat, 30 Mar 2013 19:23:33 GMT
Content-Length: 15534
Connection: keep-alive
Set-Cookie: _FS=NU=1; domain=.bing.com; path=/
Set-Cookie: _SS=SID=F2652FD33DC443498CE043186458C3FC&C=20.0; domain=.bing.com; path=/
Set-Cookie: MUID=2961778241736E4F314E732240626EBE; expires=Mon, 30-Mar-2015 19:23:33 GMT; domain=.bing.com; path=/
Set-Cookie: MUIDB=2961778241736E4F314E732240626EBE; expires=Mon, 30-Mar-2015 19:23:33 GMT; path=/
Set-Cookie: OrigMUID=2961778241736E4F314E732240626EBE%2c532012b954b64747ae9b83e7ede66522; expires=Mon, 30-Mar-2015 19:23:33 GMT; domain=.bing.com; path=/
Set-Cookie: SRCHD=D=2758763&MS=2758763&AF=NOFORM; expires=Mon, 30-Mar-2015 19:23:33 GMT; domain=.bing.com; path=/
Set-Cookie: SRCHUID=V=2&GUID=02F43275DC7F435BB3DF3FD32E181F4D; expires=Mon, 30-Mar-2015 19:23:33 GMT; path=/
Set-Cookie: SRCHUSR=AUTOREDIR=0&GEOVAR=&DOB=20130330; expires=Mon, 30-Mar-2015 19:23:33 GMT; domain=.bing.com; path=/

?}k{?H????+0?#!?,_???$?:?7vf?w?Hb???ƊG???9???/9U?\$;3{9g?ycAӗ???????W{?o?~?FZ?e ]>??<??n????׻?????????d?t??a?3?
?2?p??eBI?e??????ܒ?P??-?Q?-L?????ǼR?³?ׯ??%'
?2Kf?7???c?Y?I?1+c??,ae]?????<{?=ƞ,?^?J?ď???y??6O?_?z????_?ޞ~?_?????Bo%]???_?????W=?

For additional information, this is a packet that was isolated because it contained Content-Encoding: gzip from a sample .pcap file provided by a project.

In order to decode a gzipped HTTP response, you only need to decode the response body, not the headers.

The payload in your case is the entire TCP payload, i.e. the entire HTTP message including headers and body.

HTTP messages (requests and responses) are RFC 822 messages (which is the same generic message format that E-Mail messages (RFC 2822) are based upon).

The structure of an 822 message is very simple:

Zero or more header lines (key/ value pairs separated by :), terminated by CRLF
An empty line (CRLF (carriage return, line feed, so '\r\n')
The message body

You now could parse this message yourself in order to isolate the body. But I would rather recommend you use the tools Python already provides for you. The httplib module (Python 2.x) includes the HTTPMessage class which is used by httplib internally to parse HTTP responses. It's not meant to be used directly, but in this case I would probably still use it - it will handle some HTTP specific details for you.

Here's how you can use it to separate the body from the headers:

>>> from httplib import HTTPMessage
>>>
>>> f = open('gzipped_response.payload')
>>>
>>> # Or, if you already have the payload in memory as a string:
... # f = StringIO.StringIO(payload)
...
>>> status_line = f.readline()
>>> msg = HTTPMessage(f, 0)
>>> body = msg.fp.read()

The HTTPMessage class works in a similar way the rfc822.Message does:

First, you need to read (or discard) the status line (HTTP/1.1 200 OK), because that's not part of the RFC822 message, and is not a header.
Then you instantiate HTTPMessage with a handle to an open file and the seekable argument set to 0. The file pointer is stored as msg.fp
Upon instantiation it calls msg.readheaders(), which reads all header lines until it encounters an empty line (CRLF).
At that point, msg.fp has been advanced to the point where the headers end and the body starts. You can therefore call msg.fp.read() to read the rest of the message - the body.

After that, your code for decompressing the gzipped body just works:

>>> body_stream = StringIO.StringIO(body)
>>> gzipper = gzip.GzipFile(fileobj=body_stream)
>>> data = gzipper.read()
>>>
>>> print data[:25]
<!DOCTYPE html>
<html>