I would like to retrieve the data inside a compressed gz file stored on an FTP server, without writing the file to the local archive.
At the moment I have done
from ftplib import FTP
import gzip
ftp = FTP('ftp.server.com')
ftp.login()
ftp.cwd('/a/folder/')
fileName = 'aFile.gz'
localfile = open(fileName,'wb')
ftp.retrbinary('RETR '+fileName, localfile.write, 1024)
f = gzip.open(localfile,'rb')
data = f.read()
This, however, writes the file "localfile" on the current storage.
I tried to change this in
from ftplib import FTP
import zlib
ftp = FTP('ftp.server.com')
ftp.login()
ftp.cwd('/a/folder/')
fileName = 'aFile.gz'
data = ftp.retrbinary('RETR '+fileName, zlib.decompress, 1024)
but, ftp.retrbinary
does not output the output of its callback.
Is there a way to do this?
A simple implementation is to:
download the file to an in-memory file-like object, like
BytesIO
;pass that to
fileobj
parameter ofGzipFile
constructor.The above loads whole .gz file to a memory. What can be inefficient for large files. A smarter implementation would stream the data instead. But that would probably require implementing a smart custom file-like object.
See also Get files names inside a zip file on FTP server without downloading whole archive.