Is using StringBuilder Remove method more memory e

2019-04-24 06:39发布

站内文章 / C#

47 0

三岁会撩人

女 | 书童

私信

可以将文章内容翻译成中文,广告屏蔽插件可能会导致该功能失效(如失效，请关闭广告屏蔽插件后再试):

问题:

In C# which is more memory efficient: Option #1 or Option #2?

public void TestStringBuilder()
{
    //potentially a collection with several hundred items:
    string[] outputStrings = new string[] { "test1", "test2", "test3" };

    //Option #1
    StringBuilder formattedOutput = new StringBuilder();
    foreach (string outputString in outputStrings)
    {
        formattedOutput.Append("prefix ");
        formattedOutput.Append(outputString);
        formattedOutput.Append(" postfix");

        string output = formattedOutput.ToString();
        ExistingOutputMethodThatOnlyTakesAString(output);

        //Clear existing string to make ready for next iteration:
        formattedOutput.Remove(0, output.Length);
    }

    //Option #2
    foreach (string outputString in outputStrings)
    {
        StringBuilder formattedOutputInsideALoop = new StringBuilder();

        formattedOutputInsideALoop.Append("prefix ");
        formattedOutputInsideALoop.Append(outputString);
        formattedOutputInsideALoop.Append(" postfix");

        ExistingOutputMethodThatOnlyTakesAString(
           formattedOutputInsideALoop.ToString());
    }
}

private void ExistingOutputMethodThatOnlyTakesAString(string output)
{
    //This method actually writes out to a file.
    System.Console.WriteLine(output);
}

回答1:

Several of the answers gently suggested that I get off my duff and figure out it myself so below are my results. I think that sentiment generally goes against the grain of this site but if you want something done right, you might as well do.... :)

I modified option #1 to take advantage of @Ty suggestion to use StringBuilder.Length = 0 instead of the Remove method. This made the code of the two options more similar. The two differences are now whether the constructor for the StringBuilder is in or out of the loop and option #1 now uses the the Length method to clear the StringBuilder. Both options were set to run over an outputStrings array with 100,000 elements to make the garbage collector do some work.

A couple answers offered hints to look at the various PerfMon counters & such and use the results to pick an option. I did some research and ended up using the built-in Performance Explorer of the Visual Studio Team Systems Developer edition that I have at work. I found the second blog entry of a multipart series that explained how to set it up here. Basically, you wire up a unit test to point at the code you want to profile; go through a wizard & some configurations; and launch the unit test profiling. I enabled the .NET object allocation & lifetime metrics. The results of the profiling where difficult to format for this answer so I placed them at the end. If you copy and paste the text into Excel and massage them a bit, they'll be readable.

Option #1 is the most memory efficiency because it makes the garbage collector do a little less work and it allocates half the memory and instances to the StringBuilder object than Option #2. For everyday coding, picking option #2 is perfectly fine.

If you're still reading, I asked this question because Option #2 will make the memory leak detectors of an experience C/C++ developer go ballistic. A huge memory leak will occur if the StringBuilder instance is not released before being reassigned. Of course, we C# developers don't worry about such things (until they jump up and bite us). Thanks to all!!

ClassName   Instances   TotalBytesAllocated Gen0_InstancesCollected Gen0BytesCollected  Gen1InstancesCollected  Gen1BytesCollected
=======Option #1                    
System.Text.StringBuilder   100,001 2,000,020   100,016 2,000,320   2   40
System.String   301,020 32,587,168  201,147 11,165,268  3   246
System.Char[]   200,000 8,977,780   200,022 8,979,678   2   90
System.String[] 1   400,016 26  1,512   0   0
System.Int32    100,000 1,200,000   100,061 1,200,732   2   24
System.Object[] 100,000 2,000,000   100,070 2,004,092   2   40
======Option #2                 
System.Text.StringBuilder   200,000 4,000,000   200,011 4,000,220   4   80
System.String   401,018 37,587,036  301,127 16,164,318  3   214
System.Char[]   200,000 9,377,780   200,024 9,379,768   0   0
System.String[] 1   400,016 20  1,208   0   0
System.Int32    100,000 1,200,000   100,051 1,200,612   1   12
System.Object[] 100,000 2,000,000   100,058 2,003,004   1   20

回答2:

Option 2 should (I believe) actually outperform option 1. The act of calling Remove "forces" the StringBuilder to take a copy of the string it's already returned. The string is actually mutable within StringBuilder, and StringBuilder doesn't take a copy unless it needs to. With option 1 it copies before basically clearing the array out - with option 2 no copy is required.

The only downside of option 2 is that if the string ends up being long, there will be multiple copies made while appending - whereas option 1 keeps the original size of buffer. If this is going to be the case, however, specify an initial capacity to avoid the extra copying. (In your sample code, the string will end up being bigger than the default 16 characters - initializing it with a capacity of, say, 32 will reduce the extra strings required.)

Aside from the performance, however, option 2 is just cleaner.

回答3:

While your profiling, you could also try just setting the length of the StringBuilder to zero when you enter the loop.

formattedOutput.Length = 0;

回答4:

Since you are concerned only with memory I would suggest:

foreach (string outputString in outputStrings)
    {    
        string output = "prefix " + outputString + " postfix";
        ExistingOutputMethodThatOnlyTakesAString(output)  
    }

The variable named output is the same size in your original implementation, but no other objects are needed. StringBuilder uses strings and other objects internally and you will be created many objects that need to be GC'd.

Both the line from option 1:

string output = formattedOutput.ToString();

And the line from option 2:

ExistingOutputMethodThatOnlyTakesAString(
           formattedOutputInsideALoop.ToString());

will create an immutable object with the value of the prefix + outputString + postfix. This string is the same size no matter how you create it. What you are really asking is which is more memory efficient:

    StringBuilder formattedOutput = new StringBuilder(); 
    // create new string builder

    formattedOutput.Remove(0, output.Length); 
    // reuse existing string builder

Skipping the StringBuilder entirely will be more memory efficient than either of the above.

If you really need to know which of the two is more efficient in your application (this will probably vary based on the size of your list, prefix, and outputStrings) I would recommend red-gate ANTS Profiler http://www.red-gate.com/products/ants_profiler/index.htm

Jason

回答5:

Hate to say it, but how about just testing it?

回答6:

This stuff is easy to find out by yourself. Run Perfmon.exe and add a counter for .NET Memory + Gen 0 Collections. Run the test code a million times. You'll see that option #1 requires half of the number of collections option #2 needs.

回答7:

We've talked about this before with Java, here's the [Release] results of the C# version:

Option #1 (10000000 iterations): 11264ms
Option #2 (10000000 iterations): 12779ms

Update: In my non-scientific analysis allowing the two methods to execute while monitoring all the memory performance counters in perfmon did not result in any sort of discernible difference with either method (other than having some counters spike only while either test was executing).

And here's what I used to test:

class Program
{
    const int __iterations = 10000000;

    static void Main(string[] args)
    {
        TestStringBuilder();
        Console.ReadLine();
    }

    public static void TestStringBuilder()
    {
        //potentially a collection with several hundred items:
        var outputStrings = new [] { "test1", "test2", "test3" };

        var stopWatch = new Stopwatch();

        //Option #1
        stopWatch.Start();
        var formattedOutput = new StringBuilder();

        for (var i = 0; i < __iterations; i++)
        {
            foreach (var outputString in outputStrings)
            {
                formattedOutput.Append("prefix ");
                formattedOutput.Append(outputString);
                formattedOutput.Append(" postfix");

                var output = formattedOutput.ToString();
                ExistingOutputMethodThatOnlyTakesAString(output);

                //Clear existing string to make ready for next iteration:
                formattedOutput.Remove(0, output.Length);
            }
        }
        stopWatch.Stop();

        Console.WriteLine("Option #1 ({1} iterations): {0}ms", stopWatch.ElapsedMilliseconds, __iterations);
            Console.ReadLine();
        stopWatch.Reset();

        //Option #2
        stopWatch.Start();
        for (var i = 0; i < __iterations; i++)
        {
            foreach (var outputString in outputStrings)
            {
                StringBuilder formattedOutputInsideALoop = new StringBuilder();

                formattedOutputInsideALoop.Append("prefix ");
                formattedOutputInsideALoop.Append(outputString);
                formattedOutputInsideALoop.Append(" postfix");

                ExistingOutputMethodThatOnlyTakesAString(
                   formattedOutputInsideALoop.ToString());
            }
        }
        stopWatch.Stop();

        Console.WriteLine("Option #2 ({1} iterations): {0}ms", stopWatch.ElapsedMilliseconds, __iterations);
    }

    private static void ExistingOutputMethodThatOnlyTakesAString(string s)
    {
        // do nothing
    }
}

Option 1 in this scenario is marginally faster though option 2 is easier to read and maintain. Unless you happen to be performing this operation millions of times back to back I would stick with Option 2 because I suspect that option 1 and 2 are about the same when running in a single iteration.

回答8:

I'd say option #2 if definitely more straightforward. In terms of performance, sounds like something you'd just need to test and see. I'd guess that it doesn't make enough difference to choose the less straightforward option.

回答9:

I think option 1 would be slightly more memory efficient as a new object isn't created everytime. Having said that, the GC does a pretty good job of cleaning up resources like in option 2.

I think you may be falling into the trap of premature optimization (the root of all evil --Knuth). Your IO is going to take much more resources than the string builder.

I tend go with the clearer/cleaner option, in this case option 2.

Rob

回答10:

Measure it
Pre-allocate as close as possible to how much memory you think you'll need
If speed is your preference, then consider a fairly straight forward multi-threaded front to middle, middle to end concurrent approach (expand division of labour as required)
measure it again

what's more important to you?