Code:
static void MultipleFilesToSingleFile(string dirPath, string filePattern, string destFile)
{
string[] fileAry = Directory.GetFiles(dirPath, filePattern);
Console.WriteLine("Total File Count : " + fileAry.Length);
using (TextWriter tw = new StreamWriter(destFile, true))
{
foreach (string filePath in fileAry)
{
using (TextReader tr = new StreamReader(filePath))
{
tw.WriteLine(tr.ReadToEnd());
tr.Close();
tr.Dispose();
}
Console.WriteLine("File Processed : " + filePath);
}
tw.Close();
tw.Dispose();
}
}
I need to optimize this as its extremely slow: takes 3 minutes for 45 files of average size 40 — 50 Mb XML file.
Please note: 45 files of an average 45 MB is just one example, it can be n
numbers of files of m
size, where n
is in thousands & m
can be of average 128 Kb. In short, it can vary.
Could you please provide any views on optimization?
Several things you can do:
I my experience the default buffer sizes can be increased with noticeable benefit up to about 120K, I suspect setting a large buffer on all streams will be the easiest and most noticeable performance booster:
Use the
Stream
class, not theStreamReader
class.using
statement.One option is to utilize the copy command, and let it do what is does well.
Something like:
Why not just use the
Stream.CopyTo()
method?I would use a BlockingCollection to read so you can read and write concurrently.
Clearly should write to a separate physical disk to avoid hardware contention. This code will preserve order.
Read is going to be faster than write so no need for parallel read.
Again since read is going to be faster limit the size of the collection so read does not get farther ahead of write than it needs to.
A simple task to read the single next in parallel while writing the current has the problem of different file sizes - write a small file is faster than read a big.
I use this pattern to read and parse text on T1 and then insert to SQL on T2.
BlockingCollection Class