python parse http response (string)

2019-01-18 05:05发布

问题:

I'm using python 2.7 and I want to parse string HTTP response fields which I already extracted from a text file. What would be the easiest way? I can parse requests by using the BaseHTTPServer but couldn't manage to find something for the responses.

The responses I have are pretty standard and in the following format

HTTP/1.1 200 OK
Date: Thu, Jul  3 15:27:54 2014
Content-Type: text/xml; charset="utf-8"
Connection: close
Content-Length: 626

Thanks in advance,

回答1:

You might find this useful, keep in mind that HTTPResponse wasn't designed to be "instantiated directly by user."

Also note that the content-length header in your response string may not be valid any more (it depends on how you've aquired these responses) this just means that the call to HTTPResponse.read() needs to have value larger than the content in order to get it all.

This example is python v2 specific, in v3-ish the import locations for StringIO and httplib have changed.

from httplib import HTTPResponse
from StringIO import StringIO

http_response_str = """HTTP/1.1 200 OK
Date: Thu, Jul  3 15:27:54 2014
Content-Type: text/xml; charset="utf-8"
Connection: close
Content-Length: 626"""

class FakeSocket():
    def __init__(self, response_str):
        self._file = StringIO(response_str)
    def makefile(self, *args, **kwargs):
        return self._file

source = FakeSocket(http_response_str)
response = HTTPResponse(source)
response.begin()
print "status:", response.status
print "single header:", response.getheader('Content-Type')
print "content:", response.read(len(http_response_str)) # the len here will give a 'big enough' value to read the whole content


回答2:

You might want to consider using python-requests.

Link: http://docs.python-requests.org/en/latest/

Here is an example from http://dancallahan.info/journal/python-requests/

Considering your responses are compliant with HTTP RFC

Does this look like something you want to do?

>>> import requests
>>> url = 'http://example.test/'
>>> response = requests.get(url)
>>> response.status_code
200
>>> response.headers['content-type']
'text/html; charset=utf-8'
>>> response.content
u'Hello, world!'


标签: python http