String.Concat inefficient code?

2020-06-07 05:05发布

问题:

I was investigating String.Concat : (Reflector)

very strange :

the have the values array ,

they creating a NEW ARRAY for which later they send him to ConcatArray.

Question :

Why they created a new array ? they had values from the first place...

edit

code :

public static string Concat(params string[] values)
{
    if (values == null)
    {
        throw new ArgumentNullException("values");
    }
    int totalLength = 0;
    string[] strArray = new string[values.Length];
    for (int i = 0; i < values.Length; i++)
    {
        string str = values[i];
        strArray[i] = (str == null) ? Empty : str;
        totalLength += strArray[i].Length;
        if (totalLength < 0)
        {
            throw new OutOfMemoryException();
        }
    }
    return ConcatArray(strArray, totalLength);
}

回答1:

Well for one thing, it means that the contents of the new array can be trusted to be non-null.... and unchanging.

Without that copying, another thread could modify the original array during the call to ConcatArray, which presumably could throw an exception or even trigger a security bug. With the copying, the input array can be changed at any time - each element will be read exactly once, so there can be no inconsistency. (The result may be a mixture of old and new elements, but you won't end up with memory corruption.)

Suppose ConcatArray is trusted to do bulk copying out of the strings in the array it's passed, without checking for buffer overflow. Then if you change the input array at just the right time, you could end up writing outside the allocated memory. Badness. With this defensive copy, the system can be sure1 that the total length really is the total length.


1 Well, unless reflection is used to change the contents of a string. But that can't be done without fairly high permissions - whereas changing the contents of an array is easy.



回答2:

Why did they create a new array?

I can confirm Jon's conjecture; I have the original source code in front of me. The comments indicate that the reason for the copy is because some foolish person could be mutating the array that was passed in on another thread. What could then happen? The calculation of length could say that there will be a hundred bytes of string data in the result, but by the time the copy happens, there could be a million bytes of string data in the array.

That would be bad. The problem is prevented easily by making a copy.



回答3:

They created a new array to normalize out null entries into String.Empty. This couldn't be done on the provided values array because then they would be modifying the input.



回答4:

Inefficient

No, it doesn't matter. That array creation and copy is blazing fast relative to the concatenation, it's only copying references.

It looks like they do it to convert null strings in the input array into String.Empty (they can't do this on values because it would be modifying the input which is a no no), and to detect concatentations that would overflow before actually doing the concatenation (that's what the if(totalLength < 0) test is for). Additionally, they can use totalLength to allocate the memory for the concatenated string up front, which is more efficient.



回答5:

Probably to make sure it doesn't change over the lifetime of the method. Which would lead to totalLength not fitting the content of the array anymore.

I suspect that ConcatArray uses some unsafe memory copying, and doesn't recheck the string Lengths again. One could rewrite it to avoid the allocation, but one additional small, short-lived allocation is pretty cheap.



回答6:

as far as I can see they do some work with this - guess you don't want to mutate the original one ... remember you want your strings immutable same for the Concat-function of course - don't change a parameter-reference if not stated ....