Method for copying large amounts of data in C#

2019-05-12 11:14发布

问题:

I am using the following method to copy the contents of a directory to a different directory.

public void DirCopy(string SourcePath, string DestinationPath)
    {
        if (Directory.Exists(DestinationPath))
        {
            System.IO.DirectoryInfo downloadedMessageInfo = new DirectoryInfo(DestinationPath);

            foreach (FileInfo file in downloadedMessageInfo.GetFiles())
            {
                file.Delete();
            }
            foreach (DirectoryInfo dir in downloadedMessageInfo.GetDirectories())
            {
                dir.Delete(true);
            }
        }



        //=================================================================================
        string[] directories = System.IO.Directory.GetDirectories(SourcePath, "*.*", SearchOption.AllDirectories);

        Parallel.ForEach(directories, dirPath =>
        {
            Directory.CreateDirectory(dirPath.Replace(SourcePath, DestinationPath));
        });

        string[] files = System.IO.Directory.GetFiles(SourcePath, "*.*", SearchOption.AllDirectories);

        Parallel.ForEach(files, newPath =>
        {
            File.Copy(newPath, newPath.Replace(SourcePath, DestinationPath), true);
        });

    }

My only issue is that there is quite a bit of data in the source path and the program becomes non responsive while the copying is taking place.

I am wondering what my options are for copying data. I did some research and someone had recommended to use a buffer.

I have not really seen any solution that I understand particularly well so any help/resources that are clear and concise would be great.

Thanks for any help!

回答1:

Performing long tasks in Windows Forms, on the message thread will cause the form to become unresponsive until the task is done. You're going to need to use threading to prevent that. It can get complicated, but you're going to need a BackgroundWorker:

_Worker = new BackgroundWorker();
_Worker.WorkerReportsProgress = true;
_Worker.DoWork += Worker_DoWork;
_Worker.ProgressChanged += Worker_ProgressChanged;
_Worker.RunWorkerAsync();

A method that preforms the task:

private void Worker_DoWork(object sender, DoWorkEventArgs e)
{
    BackgroundWorker worker = sender as BackgroundWorker;
    worker.ReportProgress(1);

    // do stuff

    worker.ReportProgress(100);
}

And a method to report progress:

private void Worker_ProgressChanged(object sender, ProgressChangedEventArgs e)
{
    switch (e.ProgressPercentage)
    {
        case 1:
            // make your status bar visible
            break;

        case 100:
            // hide it again
            break;
    }
}

You can use a marquee progress bar, but if you want to report the actual percentage back, calculating file sizes and progress in your Worker_DoWork method can become complicated and is another issue.

https://msdn.microsoft.com/en-us/library/system.componentmodel.backgroundworker(v=vs.110).aspx



回答2:

If your goal is to stop the application from going into a non-responsive state, the suggested method to use buffers will not fix your problem. Instead, look at using a seperate Thread to copy your directories.

Even better, use a BackgroundWorker, which has the added benefit of being able to report progress.



回答3:

A quick fix for your issue is to use a background thread at the calling code like this:

var source_directory = "c:\\source";
var destination_directory= "c:\\destination";
Task.Run(() => DirCopy(source_directory, destination_directory));

This example uses the Task.Run method which uses one of the thread-pool threads to execute the code.

This makes sure that the UI thread is free to update the UI and to respond to user input.



回答4:

It's not clear what version of the compiler / framework you're using, but you could use asynchronous file operations and not have to worry about threading. You could also benefit from using the streaming versions EnumerateDirectories and EnumerateFiles if you have large file hierarchies.

public async Task DirCopy(string SourcePath, string DestinationPath)
{
    //slightly different from your code, in that the destination directory is simply removed recursively
    Directory.Delete(DestinationPath, true);

    //enumerate files returns earlier than get files for large file hierarchies
    //... because it is a streaming IEnumerable instead of an array
    foreach (var sourcePath in System.IO.Directory.EnumerateFiles(SourcePath, "*.*", SearchOption.AllDirectories))
    {
        var destinationPath = sourcePath.Replace(SourcePath, DestinationPath);

        //slightly different from your code, in that directories are created as needed
        //... however, this would mean empty directories are not copied
        Directory.CreateDirectory(Path.GetDirectoryName(destinationPath));

        using (var source = File.Open(sourcePath, FileMode.Open, FileAccess.Read))
        using (var destination = File.Create(destinationPath))
        {
            //async copy of the file frees the current thread
            //... e.g. for the UI thread to process UI updates
            await source.CopyToAsync(destination);
        }
    }
}


回答5:

Assuming that the real problem is your program becoming non-responsive...

Your program probably stops responding because the thread you're using to perform the copies is the thread you leverage to respond to user input. If you want copies to continue in the background while the program remains responsive, you must perform the copies asynchronously. (I assume that you're using winforms or wpf based on your context.)

The typical approach is to simply spin up a background worker, ship the copy job off to it, and let it go to town while your gui thread responds to user input. There are other more sophisticated techniques with better trade offs as well, but I suspect this will suffice for your scenario based on what you've described.

(Your Parallel.ForEach doesn't do the trick because the thread that triggers it will not continue until the Parallel.ForEach has finished executing)



回答6:

Thanks everyone I really appreciate all the input to see the different ways to accomplish this. For the time being I decided to just do a Task.Run but I am going to look into the background worker and asynch operations.

Thanks again everyone!

For reference I just did

Task.Run(()=>{ DirCopy("source","destination"); });

DirCopy

public void DirCopy(string SourcePath, string DestinationPath)
    {
        if (Directory.Exists(DestinationPath))
        {
            System.IO.DirectoryInfo downloadedMessageInfo = new DirectoryInfo(DestinationPath);

            foreach (FileInfo file in downloadedMessageInfo.GetFiles())
            {
                file.Delete();
            }
            foreach (DirectoryInfo dir in downloadedMessageInfo.GetDirectories())
            {
                dir.Delete(true);
            }
        }



        //=================================================================================
        string[] directories = System.IO.Directory.GetDirectories(SourcePath, "*.*", SearchOption.AllDirectories);
        string[] files = System.IO.Directory.GetFiles(SourcePath, "*.*", SearchOption.AllDirectories);

        totalPB.Minimum = 0;
        totalPB.Maximum = directories.Length;
        totalPB.Value = 0;
        totalPB.Step = 1;

        subTotalPB.Minimum = 0;
        subTotalPB.Maximum = directories.Length;
        subTotalPB.Value = 0;
        subTotalPB.Step = 1;

        Parallel.ForEach(directories, dirPath =>
        {
            Directory.CreateDirectory(dirPath.Replace(SourcePath, DestinationPath));
            subTotalPB.PerformStep();
        });

        Task.Run(() => 
        {

            Parallel.ForEach(files, newPath =>
            {
                File.Copy(newPath, newPath.Replace(SourcePath, DestinationPath), true);
                totalPB.PerformStep();
            });

        });



    }