This is based on another question on this site: What's the best way to download file using urllib3 However, I cannot comment there so I ask another question:
How to download a (larger) file with urllib3?
I tried to use the same code that works with urllib2 (Download file from web in Python 3), but it fails with urllib3:
http = urllib3.PoolManager()
with http.request('GET', url) as r, open(path, 'wb') as out_file:
#shutil.copyfileobj(r.data, out_file) # this writes a zero file
shutil.copyfileobj(r.data, out_file)
This says that 'bytes' object has no attribute 'read'
I then tried to use the code in that question but it gets stuck in an infinite loop because data is always '0':
http = urllib3.PoolManager()
r = http.request('GET', url)
with open(path, 'wb') as out:
while True:
data = r.read(4096)
if data is None:
break
out.write(data)
r.release_conn()
However, if I read everything in memory, the file gets downloaded correctly:
http = urllib3.PoolManager()
r = http.request('GET', url)
with open(path, 'wb') as out:
out.write(data)
I do not want to do this, as I might potentially download very large files. It is unfortunate that the urllib documentation does not cover the best practice in this topic.
(Also, please do not suggest requests or urllib2, because they are not flexible enough when it comes to self-signed certificates.)
You were very close, the piece that was missing is setting
preload_content=False
(this will be the default in an upcoming version). Also you can treat the response as a file-like object, rather than the.data
attribute (which is a magic property that will hopefully be deprecated someday).This code should work:
urllib3's response object also respects the
io
interface, so you can also do things like...As long as you add
preload_content=False
to any of your three attempts and treat the response as a file-like object, they should all work.You're totally right, I hope you'll consider helping us document this use case by sending a pull request here: https://github.com/shazow/urllib3