How to make Python check if ftp directory exists?

2019-03-24 17:19发布

I'm using this script to connect to sample ftp server and list available directories:

from ftplib import FTP
ftp = FTP('ftp.cwi.nl')   # connect to host, default port (some example server, i'll use other one)
ftp.login()               # user anonymous, passwd anonymous@
ftp.retrlines('LIST')     # list directory contents
ftp.quit()

How do I use ftp.retrlines('LIST') output to check if directory (for example public_html) exists, if it exists cd to it and then execute some other code and exit; if not execute code right away and exit?

标签: python ftp
7条回答
唯我独甜
2楼-- · 2019-03-24 18:14

=> I found this web-page while googling for a way to check if a file exists using ftplib in python. The following is what I figured out (hope it helps someone):

=> When trying to list non-existent files/directories, ftplib raises an exception. Even though Adding a try/except block is a standard practice and a good idea, I would prefer my FTP scripts to download file(s) only after making sure they exist. This helps in keeping my scripts simpler - at least when listing a directory on the FTP server is possible.

For example, the Edgar FTP server has multiple files that are stored under the directory /edgar/daily-index/. Each file is named liked "master.YYYYMMDD.idx". There is no guarantee that a file will exist for every date (YYYYMMDD) - there is no file dated 24th Nov 2013, but there is a file dated: 22th Nov 2013. How does listing work in these two cases?

# Code
from __future__ import print_function  
import ftplib  

ftp_client = ftplib.FTP("ftp.sec.gov", "anonymous", "MY.EMAIL@gmail.com")  
resp = ftp_client.sendcmd("MLST /edgar/daily-index/master.20131122.idx")  
print(resp)   
resp = ftp_client.sendcmd("MLST /edgar/daily-index/master.20131124.idx")  
print(resp)  

# Output
250-Start of list for /edgar/daily-index/master.20131122.idx  
modify=20131123030124;perm=adfr;size=301580;type=file;unique=11UAEAA398;  
UNIX.group=1;UNIX.mode=0644;UNIX.owner=1019;  
/edgar/daily-index/master.20131122.idx
250 End of list  

Traceback (most recent call last):
File "", line 10, in <module>
resp = ftp_client.sendcmd("MLST /edgar/daily-index/master.20131124.idx")
File "lib/python2.7/ftplib.py", line 244, in sendcmd
return self.getresp()
File "lib/python2.7/ftplib.py", line 219, in getresp
raise error_perm, resp
ftplib.error_perm: 550 '/edgar/daily-index/master.20131124.idx' cannot be listed

As expected, listing a non-existent file generates an exception.

=> Since I know that the Edgar FTP server will surely have the directory /edgar/daily-index/, my script can do the following to avoid raising exceptions due to non-existent files:
a) list this directory.
b) download the required file(s) if they are are present in this listing - To check the listing I typically perform a regexp search, on the list of strings that the listing operation returns.

For example this script tries to download files for the past three days. If a file is found for a certain date then it is downloaded, else nothing happens.

import ftplib
import re
from datetime import date, timedelta

ftp_client = ftplib.FTP("ftp.sec.gov", "anonymous", "MY.EMAIL@gmail.com")
listing = []
# List the directory and store each directory entry as a string in an array
ftp_client.retrlines("LIST /edgar/daily-index", listing.append)
# go back 1,2 and 3 days
for diff in [1,2,3]:
  today = (date.today() - timedelta(days=diff)).strftime("%Y%m%d")
  month = (date.today() - timedelta(days=diff)).strftime("%Y_%m")
  # the absolute path of the file we want to download - if it indeed exists
  file_path = "/edgar/daily-index/master.%(date)s.idx" % { "date": today }
  # create a regex to match the file's name
  pattern = re.compile("master.%(date)s.idx" % { "date": today })
  # filter out elements from the listing that match the pattern
  found = filter(lambda x: re.search(pattern, x) != None, listing)
  if( len(found) > 0 ):
    ftp_client.retrbinary(
      "RETR %(file_path)s" % { "file_path": file_path },
      open(
        './edgar/daily-index/%(month)s/master.%(date)s.idx' % {
          "date": today
        }, 'wb'
      ).write
    )

=> Interestingly, there are situations where we cannot list a directory on the FTP server. The edgar FTP server, for example, disallows listing on /edgar/data because it contains far too many sub-directories. In such cases, I wouldn't be able to use the "List and check for existence" approach described here - in these cases I would have to use exception handling in my downloader script to recover from non-existent file/directory access attempts.

查看更多
登录 后发表回答