I have a console application that is going to take about 625 days to complete. Unless there is a way to make it faster.
First off I am working in a directory that has around 4,000,000 files in if not more. I'm working in a database that has a row for each file and then some.
Now working with the SQL is relatively fast, the bottleneck is when I use File.Move()
each move takes 18 seconds to complete.
Is there a faster way than File.Move()
?
This is the bottleneck:
File.Move(Path.Combine(location, fileName), Path.Combine(rootDir, fileYear, fileMonth, fileName));
All of the other code runs pretty fast. All I need to do is move one file to a new location and then update the database location field.
I can show other code if needed, but really the above is the only current bottleneck.
It turns out switching from File.Move to setting up a FileInfo and using .MoveTo increased the speed significantly.
It will run in about 35 days now as opposed to 625 days.
FileInfo fileinfo = new FileInfo(Path.Combine(location, fileName));
fileinfo.MoveTo(Path.Combine(rootDir, fileYear, fileMonth, fileName));
18 seconds isn't really unusual. NTFS does not perform well when you have a lot of files in a single directory. When you ask for a file, it has to do a linear search of its directory data structure. With 1,000 files, that doesn't take too long. With 10,000 files you notice it. With 4 million files . . . yeah, it takes a while.
You can probably do this even faster if you pre-load all of the directory entries into memory. Then rather than calling the FileInfo
constructor for each file, you just look it up in your dictionary.
Something like:
var dirInfo = new DirectoryInfo(path);
// get list of all files
var files = dirInfo.GetFileSystemInfos();
var cache = new Dictionary<string, FileSystemInfo>();
foreach (var f in files)
{
cache.Add(f.FullName, f);
}
Now when you get a name from the database, you can just look it up in the dictionary. That might very well be faster than trying to get it from the disk each time.
You can move files in parallel and also using Directory.EnumerateFiles
gives you a lazy loaded list of files (of-course I have not tested it with 4,000,000 files):
var numberOfConcurrentMoves = 2;
var moves = new List<Task>();
var sourceDirectory = "source-directory";
var destinationDirectory = "destination-directory";
foreach (var filePath in Directory.EnumerateFiles(sourceDirectory))
{
var move = new Task(() =>
{
File.Move(filePath, Path.Combine(destinationDirectory, Path.GetFileName(filePath)));
//UPDATE DB
}, TaskCreationOptions.PreferFairness);
move.Start();
moves.Add(move);
if (moves.Count >= numberOfConcurrentMoves)
{
Task.WaitAll(moves.ToArray());
moves.Clear();
}
}
Task.WaitAll(moves.ToArray());