Listing files in a directory with Python when the

2019-07-19 09:58发布

I'm trying to deal with many files in Python. I first need to get a list of all the files in a single directory. At the moment, I'm using:

os.listdir(dir)

However. This isn't feasible since the directory I'm searching has upward of 81,000 files in it, and totals almost 5 Gigabytes.

What's the best way of stepping through each file one-by-one? Without Windows deciding that the Python process is not responding and killing it? Because that tends to happen.

It's being run on a 32-bit Windows XP machine, so clearly it can't index more than 4 GB of RAM.

Any other ideas form anyone to solve this problem?

2条回答
神经病院院长
2楼-- · 2019-07-19 10:14

You could use glob.iglob to avoid reading the entire list of filenames into memory. This returns a generator object allowing you to step through the filenames in your directory one by one:

import glob

files = glob.iglob(pathname\*)

for f in files:
    # do something with f
查看更多
姐就是有狂的资本
3楼-- · 2019-07-19 10:38

You may want to try using the scandir module:

scandir is a module which provides a generator version of os.listdir() that also exposes the extra file information the operating system returns when you iterate a directory. scandir also provides a much faster version of os.walk(), because it can use the extra file information exposed by the scandir() function.

There's an accepted PEP proposing to merge it into the Python standard library, so it seems to have some traction.

Simple usage example from their docs:

def subdirs(path):
    """Yield directory names not starting with '.' under given path."""
    for entry in os.scandir(path):
        if not entry.name.startswith('.') and entry.is_dir():
            yield entry.name
查看更多
登录 后发表回答