String.Replace() vs. StringBuilder.Replace()

2020-01-25 05:50发布

问题:

I have a string in which I need to replace markers with values from a dictionary. It has to be as efficient as possible. Doing a loop with a string.replace is just going to consume memory (strings are immutable, remember). Would StringBuilder.Replace() be any better since this is was designed to work with string manipulations?

I was hoping to avoid the expense of RegEx, but if that is going to be a more efficient then so be it.

Note: I don't care about code complexity, only how fast it runs and the memory it consumes.

Average stats: 255-1024 characters in length, 15-30 keys in the dictionary.

回答1:

Using RedGate Profiler using the following code

class Program
    {
        static string data = "abcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyz";
        static Dictionary<string, string> values;

        static void Main(string[] args)
        {
            Console.WriteLine("Data length: " + data.Length);
            values = new Dictionary<string, string>()
            {
                { "ab", "aa" },
                { "jk", "jj" },
                { "lm", "ll" },
                { "yz", "zz" },
                { "ef", "ff" },
                { "st", "uu" },
                { "op", "pp" },
                { "x", "y" }
            };

            StringReplace(data);
            StringBuilderReplace1(data);
            StringBuilderReplace2(new StringBuilder(data, data.Length * 2));

            Console.ReadKey();
        }

        private static void StringReplace(string data)
        {
            foreach(string k in values.Keys)
            {
                data = data.Replace(k, values[k]);
            }
        }

        private static void StringBuilderReplace1(string data)
        {
            StringBuilder sb = new StringBuilder(data, data.Length * 2);
            foreach (string k in values.Keys)
            {
                sb.Replace(k, values[k]);
            }
        }

        private static void StringBuilderReplace2(StringBuilder data)
        {
            foreach (string k in values.Keys)
            {
                data.Replace(k, values[k]);
            }
        }
    }
  • String.Replace = 5.843ms
  • StringBuilder.Replace #1 = 4.059ms
  • Stringbuilder.Replace #2 = 0.461ms

String length = 1456

stringbuilder #1 creates the stringbuilder in the method while #2 does not so the performance difference will end up being the same most likely since you're just moving that work out of the method. If you start with a stringbuilder instead of a string then #2 might be the way to go instead.

As far as memory, using RedGateMemory profiler, there is nothing to worry about until you get into MANY replace operations in which stringbuilder is going to win overall.



回答2:

This may be of help:

http://blogs.msdn.com/b/debuggingtoolbox/archive/2008/04/02/comparing-regex-replace-string-replace-and-stringbuilder-replace-which-has-better-performance.aspx

The short answer appears to be that String.Replace is faster, although it may have a larger impact on your memory footprint / garbage collection overhead.



回答3:

Yes, StringBuilder will give you both gain in speed and memory (basically because it won't create an instance of a string each time you will make a manipulation with it - StringBuilder always operates with the same object). Here is an MSDN link with some details.



回答4:

Would stringbuilder.replace be any better [than String.Replace]

Yes, a lot better. And if you can estimate an upper bound for the new string (it looks like you can) then it will probably be fast enough.

When you create it like:

  var sb = new StringBuilder(inputString, pessimisticEstimate);

then the StringBuilder will not have to re-allocate its buffer.



回答5:

Converting data from a String to a StringBuilder and back will take some time. If one is only performing a single replace operation, this time may not be recouped by the efficiency improvements inherent in StringBuilder. On the other hand, if one converts a string to a StringBuilder, then performs many Replace operations on it, and converts it back at the end, the StringBuilder approach is apt to be faster.



回答6:

Rather than running 15-30 replace operations on the entire string, it might be more efficient to use something like a trie data structure to hold your dictionary. Then you can loop through your input string once to do all your searching/replacing.



回答7:

It will depend a lot on how many of the markers are present in a given string on average.

Performance of searching for a key is likely to be similar between StringBuilder and String, but StringBuilder will win if you have to replace many markers in a single string.

If you only expect one or two markers per string on average, and your dictionary is small, I would just go for the String.Replace.

If there are many markers, you might want to define a custom syntax to identify markers - e.g. enclosing in braces with a suitable escaping rule for a literal brace. You can then implement a parsing algorithm that iterates through the characters of the string once, recognizing and replacing each marker that it finds. Or use a regex.



回答8:

My two cents here, I just wrote couple of lines of code to test how each method performs and, as expected, result is "it depends".

For longer strings Regex seems to be performing better, for shorter strings, String.Replace it is. I can see that usage of StringBuilder.Replace is not very useful, and if wrongly used, it could be lethal in GC perspective (I tried to share one instance of StringBuilder).

Check my StringReplaceTests GitHub repo.



回答9:

The problem with @DustinDavis' answer is that it recursively operates on the same string. Unless you're planning on doing a back-and-forth type of manipulation, you really should have separate objects for each manipulation case in this kind of test.

I decided to create my own test because I found some conflicting answers all over the Web, and I wanted to be completely sure. The program I am working on deals with a lot of text (files with tens of thousands of lines in some cases).

So here's a quick method you can copy and paste and see for yourself which is faster. You may have to create your own text file to test, but you can easily copy and paste text from anywhere and make a large enough file for yourself:

using System;
using System.Diagnostics;
using System.IO;
using System.Text;
using System.Windows;

void StringReplace_vs_StringBuilderReplace( string file, string word1, string word2 )
{
    using( FileStream fileStream = new FileStream( file, FileMode.Open, FileAccess.Read ) )
    using( StreamReader streamReader = new StreamReader( fileStream, Encoding.UTF8 ) )
    {
        string text = streamReader.ReadToEnd(),
               @string = text;
        StringBuilder @StringBuilder = new StringBuilder( text );
        int iterations = 10000;

        Stopwatch watch1 = new Stopwatch.StartNew();
        for( int i = 0; i < iterations; i++ )
            if( i % 2 == 0 ) @string = @string.Replace( word1, word2 );
            else @string = @string.Replace( word2, word1 );
        watch1.Stop();
        double stringMilliseconds = watch1.ElapsedMilliseconds;

        Stopwatch watch2 = new Stopwatch.StartNew();
        for( int i = 0; i < iterations; i++ )
            if( i % 2 == 0 ) @StringBuilder = @StringBuilder .Replace( word1, word2 );
            else @StringBuilder = @StringBuilder .Replace( word2, word1 );
        watch2.Stop();
        double StringBuilderMilliseconds = watch1.ElapsedMilliseconds;

        MessageBox.Show( string.Format( "string.Replace: {0}\nStringBuilder.Replace: {1}",
                                        stringMilliseconds, StringBuilderMilliseconds ) );
    }
}

I got that string.Replace() was faster by about 20% every time swapping out 8-10 letter words. Try it for yourself if you want your own empirical evidence.