Hey all, this is my first time recently trying to get into the file and os part of Python. I am trying to search a directory then find all sub directories. If the directory has no folders, add all the files to a list. And organize them all by dict.
So for instance a tree could look like this
- Starting Path
- Dir 1
- Subdir 1
- Subdir 2
- Subdir 3
Even if subsubdir has a file in it, it should be skipped because it has folders in it.
Now I can normally do this if I know how many directories I am going to be looking for, using os.listdir and os.path.isdir. However if I want this to be dynamic it will have to compensate for any amount of folders and subfolders. I have tried using os.walk and it will find all the files easily. The only trouble I am having is creating all the dicts with the path names that contain file. I need the foldernames organized by dict, up until the starting path.
So in the end, using the example above, the dict should look like this with the files in it:
dict['dir1']['subdir3']['subsubdir']['folder1'] = ['file1.jpg', 'file2.jpg']
dict['dir1']['subdir3']['subsubdir']['folder2'] = ['file3.jpg', 'file4.jpg']
Would appreciate any help on this or better ideas on organizing the information. Thanks.
Maybe you want something like:
def explore(starting_path):
alld = {'': {}}
for dirpath, dirnames, filenames in os.walk(starting_path):
d = alld
dirpath = dirpath[len(starting_path):]
for subd in dirpath.split(os.sep):
based = d
d = d[subd]
if dirnames:
for dn in dirnames:
d[dn] = {}
else:
based[subd] = filenames
return alld['']
For example, given a /tmp/a
such that:
$ ls -FR /tmp/a
b/ c/ d/
/tmp/a/b:
z/
/tmp/a/b/z:
/tmp/a/c:
za zu
/tmp/a/d:
print explore('/tmp/a')
emits: {'c': ['za', 'zu'], 'b': {'z': []}, 'd': []}
.
If this isn't exactly what you're after, maybe you can show us specifically what the differences are supposed to be? I suspect they can probably be easily fixed, if need be.
There is a basic problem with the way you want to structure the data. If dir1/subdir1
contains subdirectories and files, should dict['dir1']['subdir1']
be a list or a dictionary? To access further subdirectories with ...['subdir2']
it needs to be a dictionary, but on the other hand dict['dir1']['subdir1']
should return a list of files.
Either you have to build the tree from custom objects that combine these two aspects in some way, or you have to change the tree structure to treat files differently.
I don't know why you would want to do this. You should be able to do your processing using os.path.walk
, but in case you really need such a structure, you can do (untested):
import os
def dirfunc(fdict, dirname, fnames):
tmpdict = fdict
keys = dirname.split(os.sep)[:-1]
for k in keys:
tmpdict = tmpdict.setdefault(k, {})
for f in fnames:
if os.path.isdir(f):
return
tmpdict[dirname] = fnames
mydict = {}
os.walk(directory_to_search, dirfunc, mydict)
Also, you should not name your variable dict
because it's a Python built-in. It is a very bad idea to rebind the name dict
to something other than Python's dict
type.
Edit: edited to fix the "double last key" error and to use os.walk
.