How can I make os.walk
traverse the directory tree of an FTP database (located on a remote server)? The way the code is structured now is (comments provided):
import fnmatch, os, ftplib
def find(pattern, startdir=os.curdir): #find function taking variables for both desired file and the starting directory
for (thisDir, subsHere, filesHere) in os.walk(startdir): #each of the variables change as the directory tree is walked
for name in subsHere + filesHere: #going through all of the files and subdirectories
if fnmatch.fnmatch(name, pattern): #if the name of one of the files or subs is the same as the inputted name
fullpath = os.path.join(thisDir, name) #fullpath equals the concatenation of the directory and the name
yield fullpath #return fullpath but anew each time
def findlist(pattern, startdir = os.curdir, dosort=False):
matches = list(find(pattern, startdir)) #find with arguments pattern and startdir put into a list data structure
if dosort: matches.sort() #isn't dosort automatically False? Is this statement any different from the same thing but with a line in between
return matches
#def ftp(
#specifying where to search.
if __name__ == '__main__':
import sys
namepattern, startdir = sys.argv[1], sys.argv[2]
for name in find(namepattern, startdir): print (name)
I am thinking that I need to define a new function (i.e., def ftp()
) to add this functionality to the code above. However, I am afraid that the os.walk
function will, by default, only walk the directory trees of the computer that the code is run from.
Is there a way that I can extend the functionality of os.walk
to be able to traverse a remote directory tree (via FTP)?
Im going to assume this is what you want ... although really I have no idea
this will require the remote server to have the
mlocate
package `sudo apt-get install mlocate;sudo updatedb();All you need is utilizing the python's
ftplib
module. Sinceos.walk()
is based on a Breadth-first search algorithm you need to find the directories and file names at each iteration, then continue the traversing recursively from the first directory. I implemented this algorithm about 2 years ago for using as the heart of FTPwalker, which is an optimum package for traversing extremely large directory trees Through FTP.Now for using this class, you can simply create a connection object using
ftplib
module and pass the the object toFTPWalk
object and just loop over thewalk()
function: