I need to process all files in a directory tree recursively, but with a limited depth.
That means for example to look for files in the current directory and the first two subdirectory levels, but not any further. In that case, I must process e.g. ./subdir1/subdir2/file
, but not ./subdir1/subdir2/subdir3/file
.
How would I do this best in Python 3?
Currently I use os.walk
to process all files up to infinite depth in a loop like this:
for root, dirnames, filenames in os.walk(args.directory):
for filename in filenames:
path = os.path.join(root, filename)
# do something with that file...
I could think of a way counting the directory separators (/
) in root
to determine the current file's hierarchical level and break
the loop if that level exceeds the desired maximum.
I consider this approach as maybe insecure and probably pretty inefficient when there's a large number of subdirectories to ignore. What would be the optimal approach here?
I think the easiest and most stable approach would be to copy the functionality of os.walk
straight out of the source and insert your own depth-controlling parameter.
import os
import os.path as path
def walk(top, topdown=True, onerror=None, followlinks=False, maxdepth=None):
islink, join, isdir = path.islink, path.join, path.isdir
try:
names = os.listdir(top)
except OSError, err:
if onerror is not None:
onerror(err)
return
dirs, nondirs = [], []
for name in names:
if isdir(join(top, name)):
dirs.append(name)
else:
nondirs.append(name)
if topdown:
yield top, dirs, nondirs
if maxdepth is None or maxdepth > 1:
for name in dirs:
new_path = join(top, name)
if followlinks or not islink(new_path):
for x in walk(new_path, topdown, onerror, followlinks, None if maxdepth is None else maxdepth-1):
yield x
if not topdown:
yield top, dirs, nondirs
for root, dirnames, filenames in walk(args.directory, maxdepth=2):
#...
If you're not interested in all those optional parameters, you can pare down the function pretty substantially:
import os
def walk(top, maxdepth):
dirs, nondirs = [], []
for name in os.listdir(top):
(dirs if os.path.isdir(os.path.join(top, name)) else nondirs).append(name)
yield top, dirs, nondirs
if maxdepth > 1:
for name in dirs:
for x in walk(os.path.join(top, name), maxdepth-1):
yield x
for x in walk(".", 2):
print(x)
Starting in python 3.5, os.scandir is used in os.walk instead of os.listdir. It works many times faster. I corrected @kevin sample a little.
import os
def walk(top, maxdepth):
dirs, nondirs = [], []
for entry in os.scandir(top):
(dirs if entry.is_dir() else nondirs).append(entry.path)
yield top, dirs, nondirs
if maxdepth > 1:
for path in dirs:
for x in walkMaxDepth(path, maxdepth-1):
yield x
for x in walk(".", 2):
print(x)