Python: File download using ftplib hangs forever a

I have been trying to troubleshoot an issue where in when we are downloading a file from ftp/ftps. File gets downloaded successfully but no operation is performed post file download completion. No error has occurred which could give more information about the issue. I tried searching for this on stackoverflow and found this link which talks about similar problem statement and looks like I am facing similar issue, though I am not sure. Need little more help in resolving the issue.

I tried setting the FTP connection timeout to 60mins but of less help. Prior to this I was using retrbinary() of the ftplib but same issue occurs there. I tried passing different blocksize and windowsize but with that also issue was reproducible.

I am trying to download the file of size ~3GB from AWS EMR cluster. Sample code is written below.

    def download_ftp(self, ip, port, user_name, password, file_name, target_path):
    try:
        os.chdir(target_path)
        ftp = FTP(host=ip)
        ftp.connect(port=int(port), timeout=3000)
        ftp.login(user=user_name, passwd=password)

        if ftp.nlst(file_name) != []:
            dir = os.path.split(file_name)
            ftp.cwd(dir[0])
            for filename in ftp.nlst(file_name):
                sock = ftp.transfercmd('RETR ' + filename)

                def background():
                    fhandle = open(filename, 'wb')
                    while True:
                        block = sock.recv(1024 * 1024)
                        if not block:
                            break
                        fhandle.write(block)
                    sock.close()

                t = threading.Thread(target=background)
                t.start()
                while t.is_alive():
                    t.join(60)
                    ftp.voidcmd('NOOP')
                logger.info("File " + filename + " fetched successfully")
            return True
        else:
            logger.error("File " + file_name + " is not present in FTP")

    except Exception, e:
        logger.error(e)
        raise

Another option suggested in the above mentioned link is to close the connection post downloading small chunk of the file and then restart the connection. Can someone suggest how can this be achieved, not sure how to resume the download from the same point where the file download was stopped last time before closing the connection. Will this method be full proof of downloading the entire file.

I don't know much about FTP server level timeout settings so didn't know what and how it needs to be altered. I basically want to write a generic FTP down-loader which can help in downloading the files from FTP/FTPS.

When I use retrbinary() method of ftplib and set debug level to 2.

ftp.set_debuglevel(2)
ftp.retrbinary('RETR ' + filename, fhandle.write)

Below logs are getting printed.

cmd 'TYPE I' put 'TYPE I\r\n' get '200 Type set to I.\r\n' resp '200 Type set to I.' cmd 'PASV' put 'PASV\r\n' get '227 Entering Passive Mode (64,27,160,28,133,251).\r\n' resp '227 Entering Passive Mode (64,27,160,28,133,251).' cmd 'RETR FFFT_BRA_PM_R_201711.txt' put 'RETR FFFT_BRA_PM_R_201711.txt\r\n' get '150 Opening BINARY mode data connection for FFFT_BRA_PM_R_201711.txt.\r\n' resp '150 Opening BINARY mode data connection for FFFT_BRA_PM_R_201711.txt.'

Before doing anything, note that there is something very wrong with your connection, and diagnosing that and getting it fixed is far better than working around it. But sometimes, you just have to deal with a broken server, and even sending keepalives doesn't help. So, what can you do?

The trick is to download a chunk at a time, then abort the download—or, if the server can't handle aborting, close and reopen the connection.

Note that I'm testing everything below with ftp://speedtest.tele2.net/5MB.zip, which hopefully this doesn't cause a million people to start hammering their servers. Of course you'll want to test it with your actual server.

Testing for `REST`

The entire solution of course relies on the server being able to resume transfers, which not all servers can do—especially when you're dealing with something badly broken. So we'll need to test for that. Note that this test will be very slow, and very heavy on the server, so do not testing with your 3GB file; find something much smaller. Also, if you can put something readable there, it will help for debugging, because you may be stuck comparing files in a hex editor.

def downit():
    with open('5MB.zip', 'wb') as f:
        while True:
            ftp = FTP(host='speedtest.tele2.net', user='anonymous', passwd='test@example.com')
            pos = f.tell()
            print(pos)
            ftp.sendcmd('TYPE I')
            sock = ftp.transfercmd('RETR 5MB.zip', rest=pos)
            buf = sock.recv(1024 * 1024)
            if not buf:
                return
            f.write(buf)

You will probably not get 1MB at a time, but instead something under 8KB. Let's assume you're seeing 1448, then 2896, 4344, etc.

If you get an exception from the REST, the server does not handle resuming—give up, you're hosed.
If the file goes on past the actual file size, hit ^C, and check it in a hex editor.
- If you see the same 1448 bytes or whatever (the amount you saw it printing out) over and over again, again, you're hosed.
- If you have the right data, but with extra bytes between each chunk of 1448 bytes, that's actually fixable. If you run into this and can't figure out how to fix it by using f.seek, I can explain—but you probably won't run into it.

Testing for `ABRT`

One thing we can do is try to abort the download and not reconnect.

def downit():
    with open('5MB.zip', 'wb') as f:
        ftp = FTP(host='speedtest.tele2.net', user='anonymous', passwd='test@example.com')
        while True:
            pos = f.tell()
            print(pos)
            ftp.sendcmd('TYPE I')
            sock = ftp.transfercmd('RETR 5MB.zip', rest=pos)
            buf = sock.recv(1024 * 1024)
            if not buf:
                return
            f.write(buf)
            sock.close()
            ftp.abort()

You're going to want to try multiple variations:

No sock.close.
No ftp.abort.
With sock.close after ftp.abort.
With ftp.abort after sock.close.
All four of the above repeated with TYPE I moved to before the loop instead of each time.

Some will raise exceptions. Others will just appear to hang forever. If that's true for all 8 of them, we need to give up on aborting. But if any of them works, great!

Downloading a full chunk

The other way to speed things up is to download 1MB (or more) at a time before aborting or reconnecting. Just replace this code:

buf = sock.recv(1024 * 1024)
if buf:
    f.write(buf)

with this:

chunklen = 1024 * 1024
while chunklen:
    print('   ', f.tell())
    buf = sock.recv(chunklen)
    if not buf:
        break
    f.write(buf)
    chunklen -= len(buf)

Now, instead of reading 1442 or 8192 bytes for each transfer, you're reading up to 1MB for each transfer. Try pushing it farther.

Combining with keepalives

If, say, your downloads were failing at 10MB, and the keepalive code in your question got things up to 512MB, but it just wasn't enough for 3GB—you can combine the two. Use keepalives to read 512MB at a time, then abort or reconnect and read the next 512MB, until you're done.