I'm using os.walk
and fnmatch
with filters to search a pc's hdd for all image files. This works perfectly fine but is extremely slow since it takes about 9 minutes to search +-70000 images.
Any ideas on optimizing this code to run faster?
Any other suggestions?
I'm using python 2.7.2 by the way.
import fnmatch
import os
images = ['*.jpg', '*.jpeg', '*.png', '*.tif', '*.tiff']
matches = []
for root, dirnames, filenames in os.walk("C:\\"):
for extension in images:
for filename in fnmatch.filter(filenames, extension):
matches.append(os.path.join(root, filename))
I'm not one of those regex maniacs who always resorts to the re
hammer to solve all problems, but this actually ran a wee bit over twice as fast in my tests as your fnmatch version:
import os
import re
matches = []
img_re = re.compile(r'.+\.(jpg|png|jpeg|tif|tiff)$', re.IGNORECASE)
for root, dirnames, filenames in os.walk(r"C:\windows"):
matches.extend(os.path.join(root, name) for name in filenames if img_re.match(name))
The Python looks pretty much ok to me.
You could experiment with
for root, dirnames, filenames in os.walk("C:\\"):
for extension in extensions:
matches.extend(os.path.join(root, filename) for filename
in fnmatch.filter(filenames, extension))
If that does not make a difference (I suppose it will not), I believe your harddisk has become the bottleneck in the process (remember, disk == slow and you're iterating over and listing the files of every directory in your system).
If the harddisk is the bottleneck, the results from multiple dir /s ...
statements should definitely not be extravagantly faster than the Python solution.
import os
extns = ('.jpg', '.jpeg', '.png', '.tif', '.tiff')
matches = []
for root, dirnames, fns in os.walk("C:\\"):
matches.extend(
os.path.join(root, fn) for fn in fns if fn.lower().endswith(extns)
)