Requests is a really nice library. I'd like to use it for download big files (>1GB). The problem is it's not possible to keep whole file in memory I need to read it in chunks. And this is a problem with the following code
import requests
def DownloadFile(url)
local_filename = url.split('/')[-1]
r = requests.get(url)
f = open(local_filename, 'wb')
for chunk in r.iter_content(chunk_size=512 * 1024):
if chunk: # filter out keep-alive new chunks
f.write(chunk)
f.close()
return
By some reason it doesn't work this way. It still loads response into memory before save it to a file.
UPDATE
If you need a small client (Python 2.x /3.x) which can download big files from FTP, you can find it here. It supports multithreading & reconnects (it does monitor connections) also it tunes socket params for the download task.
I figured out what should be changed. The trick was to set
stream = True
in theget()
method.After this python process stopped to suck memory (stays around 30kb regardless size of the download file).
Thank you @danodonovan for you syntax I use it here:
See http://docs.python-requests.org/en/latest/user/advanced/#body-content-workflow for further reference.
Not exactly what OP was asking, but... it's ridiculously easy to do that with
urllib
:Or this way, if you want to save it to a temporary file:
I watched the process:
And I saw the file growing, but memory usage stayed at 17 MB. Am I missing something?
It's much easier if you use
Response.raw
andshutil.copyfileobj()
:This streams the file to disk without using excessive memory, and the code is simple.
Your chunk size could be too large, have you tried dropping that - maybe 1024 bytes at a time? (also, you could use
with
to tidy up the syntax)Incidentally, how are you deducing that the response has been loaded into memory?
It sounds as if python isn't flushing the data to file, from other SO questions you could try
f.flush()
andos.fsync()
to force the file write and free memory;