I'm looking for a way in Python (2.7) to do HTTP requests with 3 requirements:
- timeout (for reliability)
- content maximum size (for security)
- connection pooling (for performance)
I've checked quite every python HTTP librairies, but none of them meet my requirements. For instance:
urllib2: good, but no pooling
import urllib2
import json
r = urllib2.urlopen('https://github.com/timeline.json', timeout=5)
content = r.read(100+1)
if len(content) > 100:
print 'too large'
r.close()
else:
print json.loads(content)
r = urllib2.urlopen('https://github.com/timeline.json', timeout=5)
content = r.read(100000+1)
if len(content) > 100000:
print 'too large'
r.close()
else:
print json.loads(content)
requests: no max size
import requests
r = requests.get('https://github.com/timeline.json', timeout=5, stream=True)
r.headers['content-length'] # does not exists for this request, and not safe
content = r.raw.read(100000+1)
print content # ARF this is gzipped, so not the real size
print json.loads(content) # content is gzipped so pretty useless
print r.json() # Does not work anymore since raw.read was used
urllib3: never got the "read" method working, even with a 50Mo file ...
httplib: httplib.HTTPConnection is not a pool (only one connection)
I can hardly belive that urllib2 is the best HTTP library I can use ! So if anyone knows what librairy can do this or how to use one of the previous librairy ...
EDIT:
The best solution I found thanks to Martijn Pieters (StringIO does not slow down even for huge files, where str addition does a lot).
r = requests.get('https://github.com/timeline.json', stream=True)
size = 0
ctt = StringIO()
for chunk in r.iter_content(2048):
size += len(chunk)
ctt.write(chunk)
if size > maxsize:
r.close()
raise ValueError('Response too large')
content = ctt.getvalue()