Strings joining and complexity?

2019-03-30 09:12发布

问题:

When I need to join two strings I use String.Format (or StringBuilder if it happens in several places in the code).

I see that some good programmers doesn't give attention to strings joining complexity and just use the '+' operator.

I know that using the '+' operator make the application to use more memory, but what about complexity?

回答1:

This is an excellent article about the different string join methods by our own Jeff Atwood on Coding Horror:

alt text http://www.codinghorror.com/blog/images/coding-horror-official-logo-small.png

The Sad Tragedy of Micro-Optimization Theater

Here is the gist of the post.

[several string join methods shown]

Take your itchy little trigger finger off that compile key and think about this for a minute. Which one of these methods will be faster?

Got an answer? Great!

And.. drumroll please.. the correct answer:

It. Just. Doesn't. Matter!



回答2:

This answer assumes you are talking about the runtime complexity.

Using + creates a new string object, which means the contents of both the old string objects must be copied into the new one. With a large amount of concatenation, such as in a tight loop, this can turn into an O(n^2) operation.

As an informal proof, say you had the following code:

string foo = "a";
for(int i = 0; i < 1000; i++)
{
    foo += "a";
}

The first iteration of the loop, first the contents of foo ("a") are copied into a new string object, then the contents of the literal "a". That's two copies. The second iteration has three copies; two from the new foo, and one from the literal "a". The 1000th iteration will have 1001 copy operations. The total number of copies is 2 + 3 + ... + 1001. In general, if in a loop you are only concatenating one character each iteration (and you start at one character long), if the number of iterations is n, there will be 2 + 3 + ... + n + 1 copies. That's the same as 1 + 2 + 3 + ... + n = n(n+1)/2 = (n^2 + n)/2, which is O(n^2).



回答3:

Depends on the situation. The + can sometimes reduce the complexity of the code. Consider the following code:

output = "<p>" + intro + "</p>";

That is a good, clear line. No String.Format required.



回答4:

If you use + only once, you have no disadvantage from it and it increases readability (as Colin Pickard already stated).

As far as I know + means: take left operand and right operand and copy them into a new buffer (as strings are immutable).

So using + two times (as in Colin Pickards example you already create 2 temporary strings. First when adding "<p>" to intro and then when adding "</p>" to the newly created string.

You have to consider for yourself when to use which method. Even for a small example like seen above performance drop can be serious if intro is a large enough string.



回答5:

Unless your application is very string-intensive (profile, profile, profile!), this doesn't really matter. Good programmers put readability above performance for mundane operations.



回答6:

I think in terms of complexity you trade reiteration of newly created strings for parsing format string. For axample "A" + "B" + "C" + "D" means that you would have to copy "A", "AB", and at last "ABC" in order to form "ABCD". Copying is reiteration, right? So if for example you have a 1000 character string that you will sum with thousand one character strings you will copy (1000+N) character strings 1000 times. It leads to O(n^2) complexity in worst cases.

Strin.Fomat, even considering parsing, and StringBuffer should be around O(n).



回答7:

Because strings are immutable in languages like Java and C#, everytime two strings are concatenated a new string has to be created, in which the contents of the two old strings are copied.

Assume strings which are on average c characters long.

Now the first concatenation only has to copy 2*c characters, but the last one has to copy the concatenation of the first n-1 strings, which is (n-1)*c characters long, and the last one itself, which is c characters long, for a total of n*c characters. For n concatenations this makes n^2*c/2 character copies, which means an algorithmic complexity of O(n^2).

In most cases in practice however this quadratic complexity will not be noticeable (as Jeff Atwood shows in the blog entry linked to by Robert C. Cartaino) and I'd advise to just write the code as readable as possible.

There are cases however when it does matter, and using O(n^2) in such cases may be deadly.

In practice I've seen this for example for generating big Word XML files in memory, including base64 encoded pictures. This generation used to take over 10 minutes due to using O(n^2) string concatenation. After I replaced concatenation using + with StringBuilder the running time for the same document reduced below 10 seconds.

Similarly I've seen a piece of software that generated an epically big piece of SQL code as a string using + for concatenation. I haven't even waited till this finished (had been waiting for over an hour already), but just rewrote it using StringBuilder. This faster version finished within a minute.

In short, just do whatever is most readable / easiest to write and only think about this when you'll be creating a freaking huge string :-)



回答8:

StringBuilder should be used if you are building a large string in several steps. It also is a good thing if you know about how large it will be eventually, then you can initialize it with the size you need, and prevent costing re-allocations. For small operations it will not be considerable performance loss using the + operator, and it will result in clearer code (and faster to write...)



回答9:

Plenty of input already, but I've always felt that the best way to approach the issue of performance is to understand the performance differences of all viable solutions and for those that meet performance requirements, pick the one that is the most reliable and the most supportable.

There are many who use Big O notation to understand complexity, but I've found that in most cases (including understanding which string concatenation methods work best), a simple time trial will suffice. Just compare strA+strB to strA.Append(strB) in a loop of 100,000 iterations to see which works faster.



回答10:

The compiler optimizes string literal concatenation into one string literal. For example:

string s = "a" + "b" + "c";

is optimized to the following at compile time:

string s = "abc";

See this question and this MSDN article for more information.



回答11:

Compiler will optimize: "a" + "b" + "c" to be replaced with String.Concat method (not String.Format one as fixed me comments)



回答12:

I benchmarked this forever ago, and it hasn't really made a differences since .NET 1.0 or 1.1.

Back then if you had some process that was going to hit a line of code that was concatinating strings a few million times you could get a huge speed increase by using String.Concat, String.Format, or StringBuilder.

Now it doesn't matter at all. At least it hasn't mattered since .Net 2.0 came out anyhow. Put it out of your mind and code in whatever manner makes it easiest for you to read.