可以将文章内容翻译成中文,广告屏蔽插件可能会导致该功能失效(如失效,请关闭广告屏蔽插件后再试):
问题:
I am trying to improve performance of elfinder , an ajax based file manager(elRTE.ru) .
It uses os.listdir in a recurisve to walk through all directories recursively and having a performance hit (like listing a dir with 3000 + files takes 7 seconds ) ..
I am trying to improve performance for it here is it's walking function:
for d in os.listdir(path):
pd = os.path.join(path, d)
if os.path.isdir(pd) and not os.path.islink(pd) and self.__isAccepted(d):
tree['dirs'].append(self.__tree(pd))
My questions are :
- If i change os.walk instead of os.listdir , would it improve performance?
- how about using dircache.listdir() ? cache WHOLE directory/subdir contents at the initial request and return cache results , if theres no new files uploaded or no changes in file?
- Is there any other method of Directory walking which is faster?
- Any Other Server Side file browser which is fast written in python (but i prefer to make this one fast)?
回答1:
I was just trying to figure out how to speed up os.walk on a largish file system (350,000 files spread out within around 50,000 directories). I'm on a linux box usign an ext3 file system. I discovered that there is a way to speed this up for MY case.
Specifically, Using a top-down walk, any time os.walk returns a list of more than one directory, I use os.stat to get the inode number of each directory, and sort the directory list by inode number. This makes walk mostly visit the subdirectories in inode order, which reduces disk seeks.
For my use case, it sped up my complete directory walk from 18 minutes down to 13 minutes...
回答2:
Did you check out scandir (previously betterwalk)? Did not try it myself, but there's a discussion about it here and another one here. It claims to have a speedup of 3~10x on MacOSX/Linux and 7~50x on Windows by avoiding redundant calls to os.stat(). It's also now included in the standard library as of Python 3.5.
Python's built-in os.walk() is significantly slower than it needs to
be, because -- in addition to calling listdir() on each directory --
it calls stat() on each file to determine whether the filename is a
directory or not. But both FindFirstFile / FindNextFile on Windows and
readdir on Linux/OS X already tell you whether the files returned are
directories or not, so no further stat system calls are needed. In
short, you can reduce the number of system calls from about 2N to N,
where N is the total number of files and directories in the tree.
In practice, removing all those extra system calls makes os.walk()
about 7-50 times as fast on Windows, and about 3-10 times as fast on
Linux and Mac OS X.
From the project's readme.
回答3:
You should measure directly on the machines (OSs, filesystems and caches thereof, etc) of your specific interest -- whether or not os.walk
is faster than os.listdir
on a specific and totally different machine / OS / FS will tell you very little about performance on yours.
Not sure what you mean by cachedir.listdir
-- no standard library module / function by that name. listdir
already reads all the directory in at one gulp (as it must sort the results) as does os.walk
(as it must separate subdirectories from files). If, depending on your platform, you have a fast way of being notified about file/directory changes, then it's probably worth building the tree up once and editing it incrementally as change notifications come... but it depends on the relative frequency of changes vs requests, which is, again, totally dependent on your specific application circumstances.
回答4:
In order:
I doubt you'll see much of a speed-up between os.walk
and os.listdir
, since both rely on the underlying filesystem. In fact, I suspect the underlying filesystem is going to have a big effect on the speed of the operation.
Any cache operation is going to be significantly faster than hitting the filesystem (at least for the second and subsequent checks).
You could always write some utility (or call a shell command) which generates the list of directories outside of Python, and called that through the subprocess
module. But that's a little complicated, and I'd turn to that solution only if the cache turned out to not work for you.
If you haven't located a file browser on the Cheeseshop, you probably won't find one.
回答5:
How about doing it in bash?
import subprocess
command = 'ls .... or something else'
subprocess.Popen([command] ,shell=True)
In my case, which was changing permissions on thousands of files, this has worked much better.
回答6:
You are looking for fsdir. It's written in C and is made to work with python. It is much faster than walking the tree with standard python libraries.
回答7:
os.path.walk
may increase your performance, for two reasons:
1) If you can stop walking before you've walked everything, then indeed it will be faster than listdir
, although only noticeable when dealing with large trees
2) If you're listing HUGE directories, then it can be expensive to make the list returned by listdir
. (Not true, see alex's comment below)
However, it probably won't make a difference and may in fact be slower, due to the potentially extra overhead incurred by calling your visit
function and doing all the extra argument packing and unpacking.
(Really the only way to answer this question is to test it yourself - it should only take a few minutes)