If there are like 1,000,000 individual files (mostly 100k in size) in a single directory, flatly (no other directories and files in them), is there going to be any compromises in efficiency or disadvantages in any other possible ways?
相关问题
- Is shmid returned by shmget() unique across proces
- What is the best way to do a search in a large fil
- how to get running process information in java?
- Spring Integration - Inbound file endpoint. How to
- Error building gcc 4.8.3 from source: libstdc++.so
The obvious answer is the folder will be extremely difficult for humans to use long before any technical limit, (time taken to read the output from ls for one, their are dozens of other reasons) Is there a good reason why you can't split into sub folders?
Not every filesystem supports that many files.
On some of them (ext2, ext3, ext4) it's very easy to hit inode limit.
ARG_MAX is going to take issue with that... for instance, rm -rf * (while in the directory) is going to say "too many arguments". Utilities that want to do some kind of globbing (or a shell) will have some functionality break.
If that directory is available to the public (lets say via ftp, or web server) you may encounter additional problems.
The effect on any given file system depends entirely on that file system. How frequently are these files accessed, what is the file system? Remember, Linux (by default) prefers keeping recently accessed files in memory while putting processes into swap, depending on your settings. Is this directory served via http? Is Google going to see and crawl it? If so, you might need to adjust VFS cache pressure and swappiness.
Edit:
ARG_MAX is a system wide limit to how many arguments can be presented to a program's entry point. So, lets take 'rm', and the example "rm -rf *" - the shell is going to turn '*' into a space delimited list of files which in turn becomes the arguments to 'rm'.
The same thing is going to happen with ls, and several other tools. For instance, ls foo* might break if too many files start with 'foo'.
I'd advise (no matter what fs is in use) to break it up into smaller directory chunks, just for that reason alone.
My experience with large directories on ext3 and
dir_index
enabled:ls
on that directory) it will take several minutes for the first time. Then the directory will stay in the kernel cache and there will be no penalty anymore*
) does not always work as expected anymore. This is only if you really want to perform an operation on all the files at onceWithout
dir_index
however, you are really screwed :-DMost distros use Ext3 by default, which can use b-tree indexing for large directories. Some of distros have this
dir_index
feature enabled by default in others you'd have to enable it yourself. If you enable it, there's no slowdown even for millions of files.To see if
dir_index
feature is activated do (as root):To activate dir_index feature (as root):
Replace
/dev/sdaX
with partition for which you want to activate it.When you accidently execute "ls" in that directory, or use tab completion, or want to execute "rm *", you'll be in big trouble. In addition, there may be performance issues depending on your file system.
It's considered good practice to group your files into directories which are named by the first 2 or 3 characters of the filenames, e.g.