I'm trying to create a directory and copy a file (pdf) inside a Parallel.ForEach
.
Below is a simple example:
private static void CreateFolderAndCopyFile(int index)
{
const string sourcePdfPath = "c:\\testdata\\test.pdf";
const string rootPath = "c:\\testdata";
string folderDirName = string.Format("Data{0}", string.Format("{0:00000000}", index));
string folderDirPath = rootPath + @"\" + folderDirName;
Directory.CreateDirectory(folderDirPath);
string desPdfPath = folderDirPath + @"\" + "test.pdf";
File.Copy(sourcePdfPath, desPdfPath, true);
}
The method above creates a new folder and copies the pdf file to a new folder. It creates this dir tree:
TESTDATA
-Data00000000
-test.pdf
-Data00000001
-test.pdf
....
-Data0000000N
-test.pdf
I tried calling the CreateFolderAndCopyFile
method in a Parallel.ForEach
loop.
private static void Func<T>(IEnumerable<T> docs)
{
int index = 0;
Parallel.ForEach(docs, doc =>
{
CreateFolderAndCopyFile(index);
index++;
});
}
When I run this code it finishes with the following error:
The process cannot access the file 'c:\testdata\Data00001102\test.pdf' because it is being used by another process.
But first it created 1111 new folders and copied test.pdf about 1111 times before I got this error.
What caused this behaviour and how can it be resolved?
EDITED:
Code above was toy sample, sorry for hard coded strings Conclusion: Parallel method is slow.
Tomorrow I try some methods from How to write super-fast file-streaming code in C#?.
especially: http://designingefficientsoftware.wordpress.com/2011/03/03/efficient-file-io-from-csharp/
Your increment operation on
index
is suspect in that it is not thread safe. If you change the operation toConsole.WriteLine("{0}", index++)
you will see this behavior.Instead you could use a
Parallel.ForEach
overload with a loop index:You are not synchronizing access to
index
and that means you have a race on it. That's why you have the error. For illustrative purposes, you can avoid the race and keep this particular design by usingInterlocked.Increment
.However, as others suggest, the alternative overload of
ForEach
that provides a loop index is clearly a cleaner solution to this particular problem.But when you get it working you will find that copying files is IO bound rather than processor bound and I predict that the parallel code will be slower than the serial code.