I have been trying to troubleshoot an issue where in when we are downloading a file from ftp/ftps. File gets downloaded successfully but no operation is performed post file download completion. No error has occurred which could give more information about the issue. I tried searching for this on stackoverflow and found this link which talks about similar problem statement and looks like I am facing similar issue, though I am not sure. Need little more help in resolving the issue.
I tried setting the FTP connection timeout to 60mins but of less help. Prior to this I was using retrbinary() of the ftplib but same issue occurs there. I tried passing different blocksize and windowsize but with that also issue was reproducible.
I am trying to download the file of size ~3GB from AWS EMR cluster. Sample code is written below.
def download_ftp(self, ip, port, user_name, password, file_name, target_path):
try:
os.chdir(target_path)
ftp = FTP(host=ip)
ftp.connect(port=int(port), timeout=3000)
ftp.login(user=user_name, passwd=password)
if ftp.nlst(file_name) != []:
dir = os.path.split(file_name)
ftp.cwd(dir[0])
for filename in ftp.nlst(file_name):
sock = ftp.transfercmd('RETR ' + filename)
def background():
fhandle = open(filename, 'wb')
while True:
block = sock.recv(1024 * 1024)
if not block:
break
fhandle.write(block)
sock.close()
t = threading.Thread(target=background)
t.start()
while t.is_alive():
t.join(60)
ftp.voidcmd('NOOP')
logger.info("File " + filename + " fetched successfully")
return True
else:
logger.error("File " + file_name + " is not present in FTP")
except Exception, e:
logger.error(e)
raise
Another option suggested in the above mentioned link is to close the connection post downloading small chunk of the file and then restart the connection. Can someone suggest how can this be achieved, not sure how to resume the download from the same point where the file download was stopped last time before closing the connection. Will this method be full proof of downloading the entire file.
I don't know much about FTP server level timeout settings so didn't know what and how it needs to be altered. I basically want to write a generic FTP down-loader which can help in downloading the files from FTP/FTPS.
When I use retrbinary() method of ftplib and set debug level to 2.
ftp.set_debuglevel(2)
ftp.retrbinary('RETR ' + filename, fhandle.write)
Below logs are getting printed.
cmd 'TYPE I' put 'TYPE I\r\n' get '200 Type set to I.\r\n' resp '200 Type set to I.' cmd 'PASV' put 'PASV\r\n' get '227 Entering Passive Mode (64,27,160,28,133,251).\r\n' resp '227 Entering Passive Mode (64,27,160,28,133,251).' cmd 'RETR FFFT_BRA_PM_R_201711.txt' put 'RETR FFFT_BRA_PM_R_201711.txt\r\n' get '150 Opening BINARY mode data connection for FFFT_BRA_PM_R_201711.txt.\r\n' resp '150 Opening BINARY mode data connection for FFFT_BRA_PM_R_201711.txt.'