How does Windows with NTFS perform with large volumes of files and directories?
Is there any guidance around limits of files or directories you can place in a single directory before you run into performance problems or other issues?
E.g. is having a folder with 100,000 folders inside of it an OK thing to do?
I had real experience with about 100 000 files (each several MBs) on NTFS in a directory while copying one online library.
It takes about 15 minutes to open the directory with Explorer or 7-zip.
Writing site copy with
winhttrack
will always get stuck after some time. It dealt also with directory, containing about 1 000 000 files. I think the worst thing is that the MFT can only by traversed sequentially.Opening the same under ext2fsd on ext3 gave almost the same timing. Probably moving to reiserfs (not reiser4fs) can help.
Trying to avoid this situation is probably the best.
For your own programs using blobs w/o any fs could be beneficial. That's the way Facebook does for storing photos.
Here's some advice from someone with an environment where we have folders containing tens of millions of files.
To answer your question more directly: If you're looking at 100K entries, no worries. Go knock yourself out. If you're looking at tens of millions of entries, then either:
a) Make plans to sub-divide them into sub-folders (e.g., lets say you have 100M files. It's better to store them in 1000 folders so that you only have 100,000 files per folder than to store them into 1 big folder. This will create 1000 folder indices instead of a single big one that's more likely to hit the max # of fragments limit or
b) Make plans to run contig.exe on a regular basis to keep your big folder's index defragmented.
Read below only if you're bored.
The actual limit isn't on the # of fragment, but on the number of records of the data segment that stores the pointers to the fragment.
So what you have is a data segment that stores pointers to the fragments of the directory data. The directory data stores information about the sub-directories & sub-files that the directory supposedly stored. Actually, a directory doesn't "store" anything. It's just a tracking and presentation feature that presents the illusion of hierarchy to the user since the storage medium itself is linear.
When you create a folder with N entries, you create a list of N items at file-system level. This list is a system-wide shared data structure. If you then start modifying this list continuously by adding/removing entries, I expect at least some lock contention over shared data. This contention - theoretically - can negatively affect performance.
For read-only scenarios I can't imagine any reason for performance degradation of directories with large number of entries.
I am building a File-Structure to host up to 2 billion (2^32) files and performed the following tests that show a sharp drop in Navigate + Read Performance at about 250 Files or 120 Directories per NTFS Directory on a Solid State Drive (SSD):
Interestingly the Number of Directories and Files do NOT significantly interfere.
So the Lessons are:
This is the Data (2 Measurements for each File and Directory):
And this is the Test Code:
For local access, large numbers of directories/files doesn't seem to be an issue. However, if you're accessing it across a network, there's a noticeable performance hit after a few hundred (especially when accessed from Vista machines (XP to Windows Server w/NTFS seemed to run much faster in that regard)).
There are also performance problems with short file name creation slowing things down. Microsoft recommends turning off short filename creation if you have more than 300k files in a folder [1]. The less unique the first 6 characters are, the more of a problem this is.
[1] How NTFS Works from http://technet.microsoft.com, search for "300,000"