C# - How to list the files in a sub-directory fast

2019-06-07 00:59发布

问题:

I am trying to list the files in all the sub-directories of a root directory with the below approach. But its taking much time when the number of files are in millions. Is there any better approach of doing this.

I am using .NET 3.5 so can't use enumerator :-(

        ******************* Main *************
        DirectoryInfo dir = new DirectoryInfo(path);
        DirectoryInfo[] subDir = dir.GetDirectories();
        foreach (DirectoryInfo di in subDir) //call for each sub directory
        {
             PopulateList(di.FullName, false);
        }

        *******************************************
        static void PopulateList(string directory, bool IsRoot)
        {

            System.Diagnostics.ProcessStartInfo procStartInfo = new System.Diagnostics.ProcessStartInfo("cmd", "/c " + "dir /s/b \"" + directory + "\"");
            procStartInfo.RedirectStandardOutput = true;
            procStartInfo.UseShellExecute = false;
            procStartInfo.CreateNoWindow = true;
            System.Diagnostics.Process proc = new System.Diagnostics.Process();
            proc.StartInfo = procStartInfo;
            proc.Start();

            string fileName = directory.Substring(directory.LastIndexOf('\\') + 1);
            StreamWriter writer = new StreamWriter(fileName + ".lst");

            while (proc.StandardOutput.EndOfStream != true)
            {
                 writer.WriteLine(proc.StandardOutput.ReadLine());
                 writer.Flush();
            }
            writer.Close();
        }

回答1:

Remove all Process-related stuff and try out Directory.GetDirectories () and Directory.GetFiles() methods:

public IEnumerable<string> GetAllFiles(string rootDirectory)
{
    foreach(var directory in Directory.GetDirectories(
                                            rootDirectory, 
                                            "*", 
                                            SearchOption.AllDirectories))
    {
        foreach(var file in Directory.GetFiles(directory))
        {
            yield return file;
        }
    }
}

From MSDN, SearchOption.AllDirectories:

Includes the current directory and all the subdirectories in a search operation. This option includes reparse points like mounted drives and symbolic links in the search.



回答2:

It will be definitely faster to use DirectoryInfo.GetFiles in a loop for each directory instead of spawning tons of new processes to read thier output.



回答3:

With millions of files you're actually running into filesystem limitation (see this and search for "300,000"), so take this into account.

As for optimizations, I think you'd really want to iterate lazily, so you'll have to P/Invoke into FindFirstFile/FindNextFile.



回答4:

Check out already available Directory.GetFiles overload.
For example:

var paths = Directory.GetFiles(root, "*", SearchOption.AllDirectories);

And yes it will take a lot of time. But I don't think that you can increase its performance using only .Net classes.



回答5:

Assuming that your millions of files are spread across multiple sub-directories and you're using .NET 4.0, you could look at the parallel extensions.

Using a parallel foreach loop to process the list of sub-directories, could make things a lot faster.

The new parallel extensions are also a lot safer and easier to use than attempting multi-threading at a lower-level.

The one thing to look out for is making sure that you limit the number of concurrent processes to something sensible.