Quicker to os.walk or glob?

I'm messing around with file lookups in python on a large hard disk. I've been looking at os.walk and glob. I usually use os.walk as I find it much neater and seems to be quicker (for usual size directories).

Has anyone got any experience with them both and could say which is more efficient? As I say, glob seems to be slower, but you can use wildcards etc, were as with walk, you have to filter results. Here is an example of looking up core dumps.

core = re.compile(r"core\.\d*")
for root, dirs, files in os.walk("/path/to/dir/")
    for file in files:
        if core.search(file):
            path = os.path.join(root,file)
            print "Deleting: " + path
            os.remove(path)

for file in iglob("/path/to/dir/core.*")
    print "Deleting: " + file
    os.remove(file)

标签： python traversal glob os.walk directory-walk

4条回答

Root（大扎）

2楼-- · 2020-02-16 09:16

Don't waste your time for optimization before measuring/profiling. Focus on making your code simple and easy to maintain.

For example, in your code you precompile RE, which does not give you any speed boost, because re module has internal re._cache of precompiled REs.

Keep it simple
if it's slow, then profile
once you know exactly what needs to be optimized do some tweaks and always document it

Note, that some optimization done several years prior can make code run slower compared to "non-optimized" code. This applies especially for modern JIT based languages.

0人赞添加讨论(0) 举报

淡お忘

3楼-- · 2020-02-16 09:16

*, ?, and character ranges expressed with [] will be correctly matched. This is done by using the os.listdir() and fnmatch.fnmatch() functions

I think even with glob you would still have to os.walk, unless you know directly how deep your subdirectory tree is.

Btw. in the glob documentation it says:

"*, ?, and character ranges expressed with [] will be correctly matched. This is done by using the os.listdir() and fnmatch.fnmatch() functions "

I would simply go with a

for path, subdirs, files in os.walk(path):
        for name in fnmatch.filter(files, search_str):
            shutil.copy(os.path.join(path,name), dest)

0人赞添加讨论(0) 举报

唯我独甜

4楼-- · 2020-02-16 09:21

You can use os.walk and still use glob-style matching.

for root, dirs, files in os.walk(DIRECTORY):
    for file in files:
        if glob.fnmatch.fnmatch(file, PATTERN):
            print file

Not sure about speed, but obviously since os.walk is recursive, they do different things.

0人赞添加讨论(0) 举报

虎瘦雄心在

5楼-- · 2020-02-16 09:38

I made a research on a small cache of web pages in 1000 dirs. The task was to count a total number of files in dirs. The output is:

os.listdir: 0.7268s, 1326786 files found
os.walk: 3.6592s, 1326787 files found
glob.glob: 2.0133s, 1326786 files found

As you see, os.listdir is quickest of three. And glog.glob is still quicker than os.walk for this task.

The source:

import os, time, glob

n, t = 0, time.time()
for i in range(1000):
    n += len(os.listdir("./%d" % i))
t = time.time() - t
print "os.listdir: %.4fs, %d files found" % (t, n)

n, t = 0, time.time()
for root, dirs, files in os.walk("./"):
    for file in files:
        n += 1
t = time.time() - t
print "os.walk: %.4fs, %d files found" % (t, n)

n, t = 0, time.time()
for i in range(1000):
    n += len(glob.glob("./%d/*" % i))
t = time.time() - t
print "glob.glob: %.4fs, %d files found" % (t, n)

0人赞添加讨论(0) 举报

Quicker to os.walk or glob?

采纳回答

编辑标签

举报内容

检举类型

检举原因

检举说明(必填)

打开微信“扫一扫”，打开网页后点击屏幕右上角分享按钮

付费偷看金额在0.1-10元之间