I'm trying to deal with many files in Python. I first need to get a list of all the files in a single directory. At the moment, I'm using:
os.listdir(dir)
However. This isn't feasible since the directory I'm searching has upward of 81,000 files in it, and totals almost 5 Gigabytes.
What's the best way of stepping through each file one-by-one? Without Windows deciding that the Python process is not responding and killing it? Because that tends to happen.
It's being run on a 32-bit Windows XP machine, so clearly it can't index more than 4 GB of RAM.
Any other ideas form anyone to solve this problem?
You could use
glob.iglob
to avoid reading the entire list of filenames into memory. This returns a generator object allowing you to step through the filenames in your directory one by one:You may want to try using the
scandir
module:There's an accepted PEP proposing to merge it into the Python standard library, so it seems to have some traction.
Simple usage example from their docs: