Use Case:
Dowload hundred of thousands of xmls files (size from bytes to 50 mb/file) structured like this /year-month/year-month-day/hours/files with ftplib. So i loop through each hour folder for a given day and for each one i store all the filenames with ftp.nlst(), then i loop through each filename and i donwload the concerned file like this.
with open(local_file, 'wb') as fhandle:
try:
ftp.retrbinary('RETR ' + filename, fhandle.write)
except EOFError:
try:
fhandle.close()
os.remove(local_file)
ftp = ftplib.FTP()
ftp.connect(self.remote_host,self.port, timeout=60)
ftp.login(self.username, self.passwd, acct="")
ftp.cwd(self.input_folder + '/' + subdir)
try:
with open(local_file, 'wb') as fhandle:
ftp.retrbinary('RETR ' + filename, fhandle.write, 8192)
except:
self.log.error('i give up !!!')
Expected:
For each day given as input folder, download all the concerned xml files
what i get:
EOFError
What i already tried:
- I have gone though all possible posts about the subject on stackoverflow and the net in general.
- i have tried to close and open a ne connection for each subfolder in the hour folder.
- It doesn't seem to be one specific file that is causing the problem. It is definitely not the first one. i get this
EOFError
while downloading files withftp.retrbinary()
. It is related to the fact that i download hundred of thousands of xmls files, because i have tested the script with 2000 files and i didn't got any exceptions but with around 287000 files i get it always. And what i don't understand is that the script downloads each time the same amount/number of xml files, around 159 000 and it is always I have tried to play with the buffersize in
ftp.retrbinary('RETR ' + filename, fhandle.write,4096)
Question:
it may be that i have missed something? How to handle this EOFError to continue downloading all my files...and without loosing my sanity.
Finally i found a solution for my problem. Instead of opening a connection for each sub-folder, i now open a connection for each file to be downloaded. It is less performant, but i get to pass this
EOFError
. I also found out that the FTP server which i want to download files from have restrictions for example on the number of parallel connections or how long a connection may last.