Python directory searching and organizing by dict

2019-08-25 12:28发布

问题:

Hey all, this is my first time recently trying to get into the file and os part of Python. I am trying to search a directory then find all sub directories. If the directory has no folders, add all the files to a list. And organize them all by dict.

So for instance a tree could look like this

  • Starting Path
    • Dir 1
      • Subdir 1
      • Subdir 2
      • Subdir 3
        • subsubdir
          • file.jpg
          • folder1
            • file1.jpg
            • file2.jpg
          • folder2
            • file3.jpg
            • file4.jpg

Even if subsubdir has a file in it, it should be skipped because it has folders in it.

Now I can normally do this if I know how many directories I am going to be looking for, using os.listdir and os.path.isdir. However if I want this to be dynamic it will have to compensate for any amount of folders and subfolders. I have tried using os.walk and it will find all the files easily. The only trouble I am having is creating all the dicts with the path names that contain file. I need the foldernames organized by dict, up until the starting path.

So in the end, using the example above, the dict should look like this with the files in it:

dict['dir1']['subdir3']['subsubdir']['folder1'] = ['file1.jpg', 'file2.jpg']

dict['dir1']['subdir3']['subsubdir']['folder2'] = ['file3.jpg', 'file4.jpg']

Would appreciate any help on this or better ideas on organizing the information. Thanks.

回答1:

Maybe you want something like:

def explore(starting_path):
  alld = {'': {}}

  for dirpath, dirnames, filenames in os.walk(starting_path):
    d = alld
    dirpath = dirpath[len(starting_path):]
    for subd in dirpath.split(os.sep):
      based = d
      d = d[subd]
    if dirnames:
      for dn in dirnames:
        d[dn] = {}
    else:
      based[subd] = filenames
  return alld['']

For example, given a /tmp/a such that:

$ ls -FR /tmp/a
b/  c/  d/

/tmp/a/b:
z/

/tmp/a/b/z:

/tmp/a/c:
za  zu

/tmp/a/d:

print explore('/tmp/a') emits: {'c': ['za', 'zu'], 'b': {'z': []}, 'd': []}.

If this isn't exactly what you're after, maybe you can show us specifically what the differences are supposed to be? I suspect they can probably be easily fixed, if need be.



回答2:

There is a basic problem with the way you want to structure the data. If dir1/subdir1 contains subdirectories and files, should dict['dir1']['subdir1'] be a list or a dictionary? To access further subdirectories with ...['subdir2'] it needs to be a dictionary, but on the other hand dict['dir1']['subdir1'] should return a list of files.

Either you have to build the tree from custom objects that combine these two aspects in some way, or you have to change the tree structure to treat files differently.



回答3:

I don't know why you would want to do this. You should be able to do your processing using os.path.walk, but in case you really need such a structure, you can do (untested):

import os

def dirfunc(fdict, dirname, fnames):
    tmpdict = fdict
    keys = dirname.split(os.sep)[:-1]
    for k in keys:
        tmpdict = tmpdict.setdefault(k, {})

    for f in fnames:
        if os.path.isdir(f):
            return

    tmpdict[dirname] = fnames

mydict = {}
os.walk(directory_to_search, dirfunc, mydict)

Also, you should not name your variable dict because it's a Python built-in. It is a very bad idea to rebind the name dict to something other than Python's dict type.

Edit: edited to fix the "double last key" error and to use os.walk.