UPDATE #1
The code in the question works pretty good for stable connection (like local network or intranet).
UPDATE #2
I implemented the FTPClient
class with ftplib which can:
- monitor a download progress
- reconnect in case of timeout or disconnect
- makes several attempts to download file
- shows current download speed.
After reconnect it continues the download process from disconnect point (if FTP server support it). For details see my answer below.
Question
I have to implement task on python which daily downloads a bunch of big files (0.3-1.5Gb per file * 200-300 files) via FTP and then makes some processing with the files. I did it via ftplib. But from time to time it hangs and it cannot complete the download for some files. To fix the issue I started to play with KEEPALIVE settings, but I still haven't received good result
with closing(ftplib.FTP()) as ftp:
try:
ftp.connect(self.host, self.port, 30*60) #30 mins timeout
# print ftp.getwelcome()
ftp.login(self.login, self.passwd)
ftp.set_pasv(True)
ftp.sock.setsockopt(socket.SOL_SOCKET, socket.SO_KEEPALIVE, 1)
ftp.sock.setsockopt(socket.IPPROTO_TCP, socket.TCP_KEEPINTVL, 75)
ftp.sock.setsockopt(socket.IPPROTO_TCP, socket.TCP_KEEPIDLE, 60)
with open(local_filename, 'w+b') as f:
res = ftp.retrbinary('RETR %s' % orig_filename, f.write)
if not res.startswith('226 Transfer complete'):
logging.error('Downloaded of file {0} is not compile.'.format(orig_filename))
os.remove(local_filename)
return None
os.rename(local_filename, self.storage + filename + file_ext)
ftp.rename(orig_filename, orig_filename + '.copied')
return filename + file_ext
except:
logging.exception('Error during download from FTP')
Details
- Usually it takes 7-15 minutes to download a file.
- FTP server always shows me in the logs that files are fully downloaded, but the client part hangs. Not every time but from time to time.
Questions
- May it be because of a disconnect?
- How to implement a monitor for the download process and reconnect it in case if it's disconnected
Because I couldn't find any good suggestions or code samples, I implemented my own solution. Thank you so much to the Stackoverflow community for some ideas which I used in my code. I put the code to GitHub (pyFTPclient) due to the size of the code(~ 120 lines).
I tested the solution on bad quality network (include 3G mobile internet) and it was work ok for me. But of course it may have some bugs.
I will appreciate any comments or suggestions. Thank you in advance.