When getting substring in .Net, does the new strin

2019-02-06 08:31发布

Assuming I have the following strings:

string str1 = "Hello World!";  
string str2 = str1.SubString(6, 5); // "World"

I am hoping that in the above example str2 does not copy "World", but simply ends up being a new string that points to the same memory space only that it starts with an offset of 6 and a length of 5.

In actuality I am dealing with some potentially very long strings and am interested in how this works behind the scenes for performance reasons. I am not familiar enaugh with IL to look into this.

9条回答
Summer. ? 凉城
2楼-- · 2019-02-06 09:11

It references a brand new string.

查看更多
劳资没心,怎么记你
3楼-- · 2019-02-06 09:12

It's a new string.

Strings, in .NET, are always immutable. Whenever you generate a new string via a method, including Substring, it will construct the new string in memory. The only time you share references to the same data in strings in .NET is if you explicitly assign a string variable to another string (in which its copying the reference), or if you work with string constants, which are typically interned. If you know your string is going to share a value with an interned string (constant/literal from your code), you can retrieve the "shared" copy via String.Intern.

This is a good thing, btw - In order to do what you were describing, every string would require a reference (to the string data), as well as an offset + length. Right now, they only require a reference to the string data.

This would dramatically increase the size of strings in general, throughout the framework.

查看更多
时光不老,我们不散
4楼-- · 2019-02-06 09:18

SubString creates a new string. So new memory for the new strin will be allocated.

查看更多
淡お忘
5楼-- · 2019-02-06 09:20

As others have noted, the CLR makes copies when doing a substring operation.

As you note, it certainly would be possible for a string to be represented as an interior pointer with a length. This makes the substring operation extremely cheap.

There are also ways to make other operations cheap. For example, string concatenation can be made cheap by representing strings as a tree of substrings.

In both cases what is happening here is the result of the operation is not actually the "result" itself, per se, but rather, a cheap object which represents the ability to get at the results when needed.

The attentive reader will have just realized that this is how LINQ works. When we say

var results = from c in customers where c.City == "London" select c.Name;

"results" does not contain the results of the query. This code returns almost immediately; results contains an object which represents the query. Only when the query is iterated does the expensive mechanism of searching the collection spin up. We use the power of a monadic representation of sequence semantics to defer the calculations until later.

The question then becomes "is it a good idea to do the same thing on strings?" and the answer is a resounding "no". I have plenty of painful real-world experiments on this. I once spent a summer rewriting the VBScript compiler's string handling routines to store string concatenations as a tree of string concatenation operations; only when the result is actually being used as a string does the concatenation actually happen. It was disastrous; the additional time and memory needed to keep track of all the string pointers made the 99% case -- someone doing a few simple little string operations to render a web page -- about twice as slow, while massively speeding up the tiny, tiny minority of pages that were written using naive string concatenations.

The vast majority of realistic string operations in .NET programs are extremely fast; they compile down to memory moves that in normal circumstances stay well within the memory blocks that are cached by the processor, and are therefore blazingly fast.

Furthermore, using an "interior pointer" approach for strings complicates the garbage collector considerably; going with such an approach seems to make it likely that the GC would slow down overall, which benefits no one. You have to look at the total cost of the impact of the change, not just its impact on some narrow scenarios.

If you have specific performance needs due to your unusually large data then you should consider writing your own special-purpose string library that uses a "monadic" approach like LINQ does. You can represent your strings internally as arrays of char, and then substring operations simply become copying a reference to the array and changing the start and end positions.

查看更多
淡お忘
6楼-- · 2019-02-06 09:23

as Reed said, string are immutable. if you're dealing with long strings, consider using a StringBuilder, it might improve performance, depending of course on what you're trying to accomplish. if you can add some details to your question, you'll surely get suggestion on the best implementation.

查看更多
贪生不怕死
7楼-- · 2019-02-06 09:25

In the CLR strings are immutable meaning they cannot be changed. When manipulating large strings I would suggest looking at using the string builder class.

查看更多
登录 后发表回答