Faster file move method other than File.Move

2020-06-03 09:15发布

问题:

I have a console application that is going to take about 625 days to complete. Unless there is a way to make it faster.

First off I am working in a directory that has around 4,000,000 files in if not more. I'm working in a database that has a row for each file and then some.

Now working with the SQL is relatively fast, the bottleneck is when I use File.Move() each move takes 18 seconds to complete.

Is there a faster way than File.Move()?

This is the bottleneck:

File.Move(Path.Combine(location, fileName), Path.Combine(rootDir, fileYear, fileMonth, fileName));

All of the other code runs pretty fast. All I need to do is move one file to a new location and then update the database location field.

I can show other code if needed, but really the above is the only current bottleneck.

回答1:

It turns out switching from File.Move to setting up a FileInfo and using .MoveTo increased the speed significantly.

It will run in about 35 days now as opposed to 625 days.

FileInfo fileinfo = new FileInfo(Path.Combine(location, fileName));
fileinfo.MoveTo(Path.Combine(rootDir, fileYear, fileMonth, fileName));


回答2:

18 seconds isn't really unusual. NTFS does not perform well when you have a lot of files in a single directory. When you ask for a file, it has to do a linear search of its directory data structure. With 1,000 files, that doesn't take too long. With 10,000 files you notice it. With 4 million files . . . yeah, it takes a while.

You can probably do this even faster if you pre-load all of the directory entries into memory. Then rather than calling the FileInfo constructor for each file, you just look it up in your dictionary.

Something like:

var dirInfo = new DirectoryInfo(path);
// get list of all files
var files = dirInfo.GetFileSystemInfos();
var cache = new Dictionary<string, FileSystemInfo>();
foreach (var f in files)
{
    cache.Add(f.FullName, f);
}

Now when you get a name from the database, you can just look it up in the dictionary. That might very well be faster than trying to get it from the disk each time.



回答3:

You can move files in parallel and also using Directory.EnumerateFiles gives you a lazy loaded list of files (of-course I have not tested it with 4,000,000 files):

var numberOfConcurrentMoves = 2;
var moves = new List<Task>();
var sourceDirectory = "source-directory";
var destinationDirectory = "destination-directory";

foreach (var filePath in Directory.EnumerateFiles(sourceDirectory))
{
    var move = new Task(() =>
    {
        File.Move(filePath, Path.Combine(destinationDirectory, Path.GetFileName(filePath)));

        //UPDATE DB
    }, TaskCreationOptions.PreferFairness);
    move.Start();

    moves.Add(move);

    if (moves.Count >= numberOfConcurrentMoves)
    {
        Task.WaitAll(moves.ToArray());
        moves.Clear();
    }
}

Task.WaitAll(moves.ToArray());