Show progress when searching all files in a direct

2020-05-24 06:45发布

问题:

I previously asked the question Get all files and directories in specific path fast in order to find files as fastest as possible. I am using that solution in order to find the file names that match a regular expression.

I was hoping to show a progress bar because with some really large and slow hard drives it still takes about 1 minute to execute. That solution I posted on the other link does not enable me to know how many more files are missing to be traversed in order for me to show a progress bar.

One solution that I was thinking about doing was trying to obtain the size of the directory that I was planing traversing. For example when I right click on the folder C:\Users I am able to get an estimate of how big that directory is. If I am able to know the size then I will be able to show the progress by adding the size of every file that I find. In other words the progress = (current sum of file sizes) / directory size

For some reason I have not been able to efficiently get the size of that directory.

Some of the questions on stack overflow use the following approach:

But note that I get an exception and are not able to enumerate the files. I am curios in trying that method on my c drive.

On that picture I was trying to count the number of files in order to show a progress. I will probably not going to be able to get the number of files efficiently using that approach. I where just trying some of the answers on stack overflow when people asked how to get the number of files on a directory and also people asked how the get the size f a directory.

回答1:

Solving this is going to leave you with one of a few possibilities...

  1. Not displaying a progress
  2. Using an up-front cost to compute (like Windows)
  3. Performing the operation while computing the cost

If the speed is that important and you expect large directory trees I would lean to the last of these options. I've added an answer on the linked question Get all files and directories in specific path fast that demonstrates a faster means of counting files and sizes than you are currently using. To combine this into a multi-threaded piece of code for option #3, the following can be performed...

static void Main()
{
    const string directory = @"C:\Program Files";
    // Create an enumeration of the files we will want to process that simply accumulates these values...
    long total = 0;
    var fcounter = new CSharpTest.Net.IO.FindFile(directory, "*", true, true, true);
    fcounter.RaiseOnAccessDenied = false;
    fcounter.FileFound +=
        (o, e) =>
            {
                if (!e.IsDirectory)
                {
                    Interlocked.Increment(ref total);
                }
            };

    // Start a high-priority thread to perform the accumulation
    Thread t = new Thread(fcounter.Find)
        {
            IsBackground = true, 
            Priority = ThreadPriority.AboveNormal, 
            Name = "file enum"
        };
    t.Start();

    // Allow the accumulator thread to get a head-start on us
    do { Thread.Sleep(100); }
    while (total < 100 && t.IsAlive);

    // Now we can process the files normally and update a percentage
    long count = 0, percentage = 0;
    var task = new CSharpTest.Net.IO.FindFile(directory, "*", true, true, true);
    task.RaiseOnAccessDenied = false;
    task.FileFound +=
        (o, e) =>
            {
                if (!e.IsDirectory)
                {
                    ProcessFile(e.FullPath);
                    // Update the percentage complete...
                    long progress = ++count * 100 / Interlocked.Read(ref total);
                    if (progress > percentage && progress <= 100)
                    {
                        percentage = progress;
                        Console.WriteLine("{0}% complete.", percentage);
                    }
                }
            };

    task.Find();
}

The FindFile class implementation can be found at FindFile.cs.

Depending on how expensive your file-processing task is (the ProcessFile function above) you should see a very clean progression of the progress on large volumes of files. If your file-processing is extremely fast, you may want to increase the lag between the start of enumeration and start of processing.

The event argument is of type FindFile.FileFoundEventArgs and is a mutable class so be sure you don't keep a reference to the event argument as it's values will change.

Ideally you will want to add error handling and probably the ability to abort both enumerations. Aborting the enumeration can be done by setting "CancelEnumeration" on the event argument.



回答2:

What you are asking may not be possible because of how the file-system store it's data.

It is a file system limitation

There is no way to know the total size of a folder, nor the total files count inside a folder without enumerating files one by one. Neither of these informations are stored in the file system.

This is why Windows shows a message like "Calculating space" before copying folders with a lot of files... it is actually counting how many files are there inside the folder, and summing their sizes so that it can show the progress bar while doing the real copy operation. (it also uses the informations to know if the destination has enough space to hold all the data being copied).

Also when you right-click a folder, and go to properties, note that it takes some time to count all files and to sum all the file sizes. That is caused by the same limitation.

To know how large a folder is, or how many files are there inside a folder, you must enumerate the files one-by-one.

Fast files enumeration

Of course, as you already know, there are a lot of ways of doing the enumeration itself... but none will be instantaneous. You could try using the USN Journal of the file system to do the scan. Take a look at this project in CodePlex: MFT Scanner in VB.NET (the code is actually in C#... don't know why the author says it is VB.NET) ... it found all the files in my IDE SATA (not SSD) drive in less than 15 seconds, and found 311000 files.

You will have to filter the files by path, so that only the files inside the path you are looking are returned. But that is the easy part of the job!

Hope this helps in your project... good luck!