I been searching on this for a couple of days and havent found an answer yet.
I have trying to download video files from an FTP, my script checks the server, compares the nlist() to a list of already downloaded files parsed from a text file and then creates a new list of files to get and iterates over it downloading each file, disconnecting from the server and reconnecting for the next file (I thought server timeout might be an issue so I quit() the connection after each file download).
This works for the first few files but as soon as I hit a file that takes longer than 5 mins, fitlib just hangs at the end of the transfer (I can see in explorer that the file is the correct size so the download has completed but it doesnt seem to be getting the message and moving on to the next file)
any help will be greatly appreciated, my code is below:
newPath = "Z:\\pathto\\downloads\\"
for f in getFiles:
print("Getting " + f)
for f in getFiles:
fil = f.rstrip()
ext = os.path.splitext(fil)[1]
if ext in validExtensions:
print("Downloading new file: " + fil)
downloadFile(fil, newPath)
here is download.py
from ftplib import FTP
def downloadFile(filename, folder):
myhost = 'host'
myuser = 'user'
passw = 'pass'
#login
ftp = FTP(myhost,myuser,passw)
localfile = open(folder + filename, 'wb')
ftp.retrbinary("RETR " + filename, localfile.write, 1024)
print("Downloaded " + filename)
localfile.close()
ftp.quit()
Without more information, I can't actually debug your problem, so I can only suggest the most general answer. This will probably not be necessary for you, but probably will be sufficient for anyone.
retrbinary
will block until the entire file is done. If that's longer than 5 minutes, nothing will get sent over the control channel for the entire 5 minutes. Either your client is timing out the control channel, or the server is. So, when you try to hang up withftp.quit()
, it will either hang forever or raise an exception.You can control your side's timeout with a
timeout
argument on theFTP
constructor. Some servers support anIDLE
command to allow you to set the server-side timeout. But, even if the appropriate one turns out to be doable, how do you pick an appropriate timeout in the first place?What you really want to do is prevent the control socket from timing out while a transfer is happening on the data socket. But how? If you, e.g.,
ftp.voidcmd('NOOP')
every so often in your callback function, that'll be enough to keep the connection alive… but it'll also force you to block until the server responds to theNOOP
, which many servers will not do until the data transfer is complete, which means you'll just end up blocking forever (or until a different timeout) and not getting your data.The standard techniques for handling two sockets without one blocking on the other are a multiplexer like
select.select
or threads. And you can do that here, but you will have to give up using the simpleretrbinary
interface and instead usingtransfercmd
to get the data socket explicitly.For example:
An alternative solution would be to read, say, 20MB at a time, then call
ftp.abort()
, and use therest
argument to resume the transfer with each newretrbinary
until you reach the end of the file. However,ABOR
could hang forever, just like thatNOOP
, so that doesn't guarantee anything—not to mention that servers don't have to respond to it.What you could do is just close the whole connection down (not
quit
, butclose
). This is not very nice to the server, and may result in some wasted data being re-sent, and may also prevent TCP from doing its usual ramp up to full speed if you kill the sockets too quickly. But it should work.See this answer—and notice that it requires a bit of testing against your particular broken server to figure out which, if any, variation works correctly and efficiently.