In C# which is more memory efficient: Option #1 or Option #2?
public void TestStringBuilder()
{
//potentially a collection with several hundred items:
string[] outputStrings = new string[] { "test1", "test2", "test3" };
//Option #1
StringBuilder formattedOutput = new StringBuilder();
foreach (string outputString in outputStrings)
{
formattedOutput.Append("prefix ");
formattedOutput.Append(outputString);
formattedOutput.Append(" postfix");
string output = formattedOutput.ToString();
ExistingOutputMethodThatOnlyTakesAString(output);
//Clear existing string to make ready for next iteration:
formattedOutput.Remove(0, output.Length);
}
//Option #2
foreach (string outputString in outputStrings)
{
StringBuilder formattedOutputInsideALoop = new StringBuilder();
formattedOutputInsideALoop.Append("prefix ");
formattedOutputInsideALoop.Append(outputString);
formattedOutputInsideALoop.Append(" postfix");
ExistingOutputMethodThatOnlyTakesAString(
formattedOutputInsideALoop.ToString());
}
}
private void ExistingOutputMethodThatOnlyTakesAString(string output)
{
//This method actually writes out to a file.
System.Console.WriteLine(output);
}
Several of the answers gently suggested that I get off my duff and figure out it myself so below are my results. I think that sentiment generally goes against the grain of this site but if you want something done right, you might as well do.... :)
I modified option #1 to take advantage of @Ty suggestion to use StringBuilder.Length = 0 instead of the Remove method. This made the code of the two options more similar. The two differences are now whether the constructor for the StringBuilder is in or out of the loop and option #1 now uses the the Length method to clear the StringBuilder. Both options were set to run over an outputStrings array with 100,000 elements to make the garbage collector do some work.
A couple answers offered hints to look at the various PerfMon counters & such and use the results to pick an option. I did some research and ended up using the built-in Performance Explorer of the Visual Studio Team Systems Developer edition that I have at work. I found the second blog entry of a multipart series that explained how to set it up here. Basically, you wire up a unit test to point at the code you want to profile; go through a wizard & some configurations; and launch the unit test profiling. I enabled the .NET object allocation & lifetime metrics. The results of the profiling where difficult to format for this answer so I placed them at the end. If you copy and paste the text into Excel and massage them a bit, they'll be readable.
Option #1 is the most memory efficiency because it makes the garbage collector do a little less work and it allocates half the memory and instances to the StringBuilder object than Option #2. For everyday coding, picking option #2 is perfectly fine.
If you're still reading, I asked this question because Option #2 will make the memory leak detectors of an experience C/C++ developer go ballistic. A huge memory leak will occur if the StringBuilder instance is not released before being reassigned. Of course, we C# developers don't worry about such things (until they jump up and bite us). Thanks to all!!
ClassName Instances TotalBytesAllocated Gen0_InstancesCollected Gen0BytesCollected Gen1InstancesCollected Gen1BytesCollected
=======Option #1
System.Text.StringBuilder 100,001 2,000,020 100,016 2,000,320 2 40
System.String 301,020 32,587,168 201,147 11,165,268 3 246
System.Char[] 200,000 8,977,780 200,022 8,979,678 2 90
System.String[] 1 400,016 26 1,512 0 0
System.Int32 100,000 1,200,000 100,061 1,200,732 2 24
System.Object[] 100,000 2,000,000 100,070 2,004,092 2 40
======Option #2
System.Text.StringBuilder 200,000 4,000,000 200,011 4,000,220 4 80
System.String 401,018 37,587,036 301,127 16,164,318 3 214
System.Char[] 200,000 9,377,780 200,024 9,379,768 0 0
System.String[] 1 400,016 20 1,208 0 0
System.Int32 100,000 1,200,000 100,051 1,200,612 1 12
System.Object[] 100,000 2,000,000 100,058 2,003,004 1 20
Option 2 should (I believe) actually outperform option 1. The act of calling Remove
"forces" the StringBuilder to take a copy of the string it's already returned. The string is actually mutable within StringBuilder, and StringBuilder doesn't take a copy unless it needs to. With option 1 it copies before basically clearing the array out - with option 2 no copy is required.
The only downside of option 2 is that if the string ends up being long, there will be multiple copies made while appending - whereas option 1 keeps the original size of buffer. If this is going to be the case, however, specify an initial capacity to avoid the extra copying. (In your sample code, the string will end up being bigger than the default 16 characters - initializing it with a capacity of, say, 32 will reduce the extra strings required.)
Aside from the performance, however, option 2 is just cleaner.
While your profiling, you could also try just setting the length of the StringBuilder to zero when you enter the loop.
formattedOutput.Length = 0;
Since you are concerned only with memory I would suggest:
foreach (string outputString in outputStrings)
{
string output = "prefix " + outputString + " postfix";
ExistingOutputMethodThatOnlyTakesAString(output)
}
The variable named output is the same size in your original implementation, but no other objects are needed. StringBuilder uses strings and other objects internally and you will be created many objects that need to be GC'd.
Both the line from option 1:
string output = formattedOutput.ToString();
And the line from option 2:
ExistingOutputMethodThatOnlyTakesAString(
formattedOutputInsideALoop.ToString());
will create an immutable object with the value of the prefix + outputString + postfix. This string is the same size no matter how you create it. What you are really asking is which is more memory efficient:
StringBuilder formattedOutput = new StringBuilder();
// create new string builder
or
formattedOutput.Remove(0, output.Length);
// reuse existing string builder
Skipping the StringBuilder entirely will be more memory efficient than either of the above.
If you really need to know which of the two is more efficient in your application (this will probably vary based on the size of your list, prefix, and outputStrings) I would recommend red-gate ANTS Profiler http://www.red-gate.com/products/ants_profiler/index.htm
Jason
Hate to say it, but how about just testing it?
This stuff is easy to find out by yourself. Run Perfmon.exe and add a counter for .NET Memory + Gen 0 Collections. Run the test code a million times. You'll see that option #1 requires half of the number of collections option #2 needs.
We've talked about this before with Java, here's the [Release] results of the C# version:
Option #1 (10000000 iterations): 11264ms
Option #2 (10000000 iterations): 12779ms
Update: In my non-scientific analysis allowing the two methods to execute while monitoring all the memory performance counters in perfmon did not result in any sort of discernible difference with either method (other than having some counters spike only while either test was executing).
And here's what I used to test:
class Program
{
const int __iterations = 10000000;
static void Main(string[] args)
{
TestStringBuilder();
Console.ReadLine();
}
public static void TestStringBuilder()
{
//potentially a collection with several hundred items:
var outputStrings = new [] { "test1", "test2", "test3" };
var stopWatch = new Stopwatch();
//Option #1
stopWatch.Start();
var formattedOutput = new StringBuilder();
for (var i = 0; i < __iterations; i++)
{
foreach (var outputString in outputStrings)
{
formattedOutput.Append("prefix ");
formattedOutput.Append(outputString);
formattedOutput.Append(" postfix");
var output = formattedOutput.ToString();
ExistingOutputMethodThatOnlyTakesAString(output);
//Clear existing string to make ready for next iteration:
formattedOutput.Remove(0, output.Length);
}
}
stopWatch.Stop();
Console.WriteLine("Option #1 ({1} iterations): {0}ms", stopWatch.ElapsedMilliseconds, __iterations);
Console.ReadLine();
stopWatch.Reset();
//Option #2
stopWatch.Start();
for (var i = 0; i < __iterations; i++)
{
foreach (var outputString in outputStrings)
{
StringBuilder formattedOutputInsideALoop = new StringBuilder();
formattedOutputInsideALoop.Append("prefix ");
formattedOutputInsideALoop.Append(outputString);
formattedOutputInsideALoop.Append(" postfix");
ExistingOutputMethodThatOnlyTakesAString(
formattedOutputInsideALoop.ToString());
}
}
stopWatch.Stop();
Console.WriteLine("Option #2 ({1} iterations): {0}ms", stopWatch.ElapsedMilliseconds, __iterations);
}
private static void ExistingOutputMethodThatOnlyTakesAString(string s)
{
// do nothing
}
}
Option 1 in this scenario is marginally faster though option 2 is easier to read and maintain. Unless you happen to be performing this operation millions of times back to back I would stick with Option 2 because I suspect that option 1 and 2 are about the same when running in a single iteration.
I'd say option #2 if definitely more straightforward. In terms of performance, sounds like something you'd just need to test and see. I'd guess that it doesn't make enough difference to choose the less straightforward option.
I think option 1 would be slightly more memory efficient as a new object isn't created everytime. Having said that, the GC does a pretty good job of cleaning up resources like in option 2.
I think you may be falling into the trap of premature optimization (the root of all evil --Knuth). Your IO is going to take much more resources than the string builder.
I tend go with the clearer/cleaner option, in this case option 2.
Rob
- Measure it
- Pre-allocate as close as possible to how much memory you think you'll need
- If speed is your preference, then consider a fairly straight forward multi-threaded front to middle, middle to end concurrent approach (expand division of labour as required)
- measure it again
what's more important to you?
memory
speed
clarity