This question already has an answer here:
-
Python 3 - travel directory tree with limited recursion depth
2 answers
I want to build a program that uses some basic code to read through a folder and tell me how many files are in the folder.
Here is how I do that currently:
import os
folders = ['Y:\\path1', 'Y:\\path2', 'Y:\\path3']
for stuff in folders:
for root, dirs, files in os.walk(stuff, topdown=True):
print("there are", len(files), "files in", root)
This works great until there are multiple folders inside the "main" folder as it can return a long, junky list of files due to poor folder/file management. So I would like to go only to the second level at most. example:
Main Folder
---file_i_want
---file_i_want
---Sub_Folder
------file_i_want <--*
------file_i want <--*
------Sub_Folder_2
---------file_i_dont_want
---------file_i_dont_want
I know how to go to only the first level with a break
and with del dirs[:]
taken from this post and also this post.
import os
import pandas as pd
folders = ['Y:\\path1', 'Y:\\path2', 'Y:\\path3']
for stuff in folders:
for root, dirs, files in os.walk(stuff, topdown=True):
print("there are", len(files), "files in", root)
del dirs[:] # or a break here. does the same thing.
But no matter my searching I can't find out how to go two layers deep. I may just not be understanding the other posts on it or something? I was thinking something like del dirs[:2]
but to no avail. Can someone guide me or explain to mehow to accomplish this?
you could do like this:
for root,dirs,files in os.walk(stuff):
if root[len(stuff)+1:].count(os.sep)<2:
for f in files:
print(os.path.join(root,f))
key is: if root[len(stuff)+1:].count(os.sep)<2
It removes stuff
+separator from root
, so result is relative to stuff
. Just count the number of files separators, and don't enter the condition unless you get 0 or 1 separators.
Of course, it still scans the full file structure, but unless it's very deep that'll work.
Another solution would be to only use os.listdir
recursively (with directory check) with a maximum recursion level, but that's a little trickier if you don't need it. Since it's not that hard, here's one implementation:
def scanrec(root):
rval = []
def do_scan(start_dir,output,depth=0):
for f in os.listdir(start_dir):
ff = os.path.join(start_dir,f)
if os.path.isdir(ff):
if depth<2:
do_scan(ff,output,depth+1)
else:
output.append(ff)
do_scan(root,rval,0)
return rval
print(scanrec(stuff)) # prints the list of files not below 2 deep
Note: os.listdir
and os.path.isfile
perform 2 stat
calls so not optimal. In Python 3.5, the use of os.scandir
could avoid that double call.
You can count the separators and if it's two levels deep delete the content of dirs
so walk
doesn't recurse deeper:
import os
MAX_DEPTH = 2
folders = ['Y:\\path1', 'Y:\\path2', 'Y:\\path3']
for stuff in folders:
for root, dirs, files in os.walk(stuff, topdown=True):
print("there are", len(files), "files in", root)
if root.count(os.sep) - stuff.count(os.sep) == MAX_DEPTH - 1:
del dirs[:]
Python documentation states following about the behavior:
When topdown is True, the caller can modify the dirnames list in-place (perhaps using del or slice assignment), and walk() will only recurse into the subdirectories whose names remain in dirnames; this can be used to prune the search, impose a specific order of visiting, or even to inform walk() about directories the caller creates or renames before it resumes walk() again.
Note that you need to take into account the the separators present in the folders
. For example when y:\path1
is walked root is y:\path
but you don't want to stop recursion there.