Is there a more elegant/faster way to write the code below? Currently taking about 45seconds.
query.sql is 200,000 lines long and has SQL in it exactly like this on each line:
SELECT N'+dave' AS [AccountName], N'20005' AS [EmployeeID], N'-6' AS [PlatformID] UNION ALL
I found that by chunking into blocks of 1000 that things were much quicker than waiting until the end and using WriteAllText (which took about 20 minutes to run)
static void Main(string[] args)
{
var s = new Stopwatch();
s.Start();
string textToWrite = "";
string[] lines = File.ReadAllLines(@"e:\temp\query.sql");
int i = 0;
foreach (var line in lines)
{
var bits = line.Split('\'');
var value1 = bits[1];
var value2 = bits[3];
var value3 = bits[5];
var message = "INSERT [PreStaging].[Import_AccountEmployeeMapping] ([AccountName], [EmployeeID], [PlatformID]) VALUES (N" +
"'" + value1 + "', "
+ value2 + ", "
+ value3 + ")";
textToWrite += message + Environment.NewLine;
if (i % 1000 == 0)
{
Console.WriteLine(i + " " + DateTime.Now.ToLongTimeString());
File.AppendAllText(@"e:\temp\query2.sql", textToWrite);
textToWrite = "";
}
i++;
}
//File.WriteAllText(@"e:\temp\query2.sql", textToWrite);
File.AppendAllText(@"e:\temp\query2.sql", textToWrite);
s.Stop();
TimeSpan ts = s.Elapsed;
Console.WriteLine("Timespan: {0}m", ts.TotalMinutes);
Console.WriteLine("Total records: " + i);
Console.ReadLine();
}
Edit: StringBuilder Solution (1000ms):
static void Main2(string[] args)
{
var s = new Stopwatch();
s.Start();
var textToWrite = new StringBuilder();
string[] lines = File.ReadAllLines(@"e:\temp\query.sql");
int i = 0;
foreach (var line in lines)
{
var bits = line.Split('\'');
var value1 = bits[1];
var value2 = bits[3];
var value3 = bits[5];
var message = "INSERT [PreStaging].[Import_AccountEmployeeMapping] ([AccountName], [EmployeeID], [PlatformID]) VALUES (N" +
"'" + value1 + "', "
+ value2 + ", "
+ value3 + ")"
+ Environment.NewLine;
textToWrite.Append(message);
// Buffering
if (i % 1000 == 0)
{
Console.WriteLine(i + " " + DateTime.Now.ToLongTimeString());
File.AppendAllText(@"e:\temp\query2.sql", textToWrite.ToString());
textToWrite = new StringBuilder();
}
i++;
}
File.AppendAllText(@"e:\temp\query2.sql", textToWrite.ToString());
s.Stop();
TimeSpan ts = s.Elapsed;
Console.WriteLine("Timespan: {0}ms", ts.TotalMilliseconds);
Console.WriteLine("Total records: " + i);
Console.ReadLine();
}
Edit: StreamWriter solution (450ms)
static void Main(string[] args)
{
var s = new Stopwatch();
s.Start();
string[] lines = File.ReadAllLines(@"e:\temp\query.sql");
int i = 0;
using (StreamWriter writer = File.AppendText(@"e:\temp\query2.sql"))
{
foreach (var line in lines)
{
var bits = line.Split('\'');
var value1 = bits[1];
var value2 = bits[3];
var value3 = bits[5];
writer.WriteLine("INSERT [PreStaging].[Import_AccountEmployeeMapping] ([AccountName], [EmployeeID], [PlatformID]) VALUES (N'{0}', {1}, {2})",
value1, value2, value3);
i++;
}
}
s.Stop();
TimeSpan ts = s.Elapsed;
Console.WriteLine("Timespan: {0}ms", ts.TotalMilliseconds);
Console.WriteLine("Total records: " + i);
Console.ReadLine();
}
The best way would most likely be to open both files simultaneously, read and write each line as you go, then close the files.
However, the biggest problem you are most likely running into is string concatenation. Strings in .NET are immutable, so each concatenation results in a new copy being allocated, which takes both time and memory (although the GC will eventually give you back the latter).
If you replace
textToWrite
with aStringBuilder
, and only do a singleToString()
at the end, you will see a lot better performance.Or, honestly, you could probably do a single regex replace on the whole thing and be done with it, although I believe you have to read the whole file into memory first as you are already doing.
A good start is to use the StringBuilder built in class in .net. This is going to avoid a bunch of string allocations and copying.
See the MSDN documenation about how it works: http://msdn.microsoft.com/en-us/library/system.text.stringbuilder.aspx
Also see this Stackoverflow post for more information: Most efficient way to concatenate strings?
Example:
As others have pointed out, use a
StringBuilder
. So in your case, declare:Then:
Although you'd probably be better off dispensing with the buffering altogether:
What version of sql server? The best way to do this is not to use one giant sql script, but to use either a table valued parameter or use sql servers bulk copy support.
MemoryMappedFiles are efficient so they may be worth looking into.
You may find building the entire text first and then writing the entire thing to the MemoryMappedFiled performs better as there are fewer calls to
ToString
instead.