Faster file move method other than File.Move

2020-06-03 08:45发布

I have a console application that is going to take about 625 days to complete. Unless there is a way to make it faster.

First off I am working in a directory that has around 4,000,000 files in if not more. I'm working in a database that has a row for each file and then some.

Now working with the SQL is relatively fast, the bottleneck is when I use File.Move() each move takes 18 seconds to complete.

Is there a faster way than File.Move()?

This is the bottleneck:

File.Move(Path.Combine(location, fileName), Path.Combine(rootDir, fileYear, fileMonth, fileName));

All of the other code runs pretty fast. All I need to do is move one file to a new location and then update the database location field.

I can show other code if needed, but really the above is the only current bottleneck.

3条回答
手持菜刀,她持情操
2楼-- · 2020-06-03 09:13

You can move files in parallel and also using Directory.EnumerateFiles gives you a lazy loaded list of files (of-course I have not tested it with 4,000,000 files):

var numberOfConcurrentMoves = 2;
var moves = new List<Task>();
var sourceDirectory = "source-directory";
var destinationDirectory = "destination-directory";

foreach (var filePath in Directory.EnumerateFiles(sourceDirectory))
{
    var move = new Task(() =>
    {
        File.Move(filePath, Path.Combine(destinationDirectory, Path.GetFileName(filePath)));

        //UPDATE DB
    }, TaskCreationOptions.PreferFairness);
    move.Start();

    moves.Add(move);

    if (moves.Count >= numberOfConcurrentMoves)
    {
        Task.WaitAll(moves.ToArray());
        moves.Clear();
    }
}

Task.WaitAll(moves.ToArray());
查看更多
smile是对你的礼貌
3楼-- · 2020-06-03 09:18

It turns out switching from File.Move to setting up a FileInfo and using .MoveTo increased the speed significantly.

It will run in about 35 days now as opposed to 625 days.

FileInfo fileinfo = new FileInfo(Path.Combine(location, fileName));
fileinfo.MoveTo(Path.Combine(rootDir, fileYear, fileMonth, fileName));
查看更多
SAY GOODBYE
4楼-- · 2020-06-03 09:20

18 seconds isn't really unusual. NTFS does not perform well when you have a lot of files in a single directory. When you ask for a file, it has to do a linear search of its directory data structure. With 1,000 files, that doesn't take too long. With 10,000 files you notice it. With 4 million files . . . yeah, it takes a while.

You can probably do this even faster if you pre-load all of the directory entries into memory. Then rather than calling the FileInfo constructor for each file, you just look it up in your dictionary.

Something like:

var dirInfo = new DirectoryInfo(path);
// get list of all files
var files = dirInfo.GetFileSystemInfos();
var cache = new Dictionary<string, FileSystemInfo>();
foreach (var f in files)
{
    cache.Add(f.FullName, f);
}

Now when you get a name from the database, you can just look it up in the dictionary. That might very well be faster than trying to get it from the disk each time.

查看更多
登录 后发表回答