From a brief look using Reflector, it looks like String.Substring()
allocates memory for each substring. Am I correct that this is the case? I thought that wouldn't be necessary since strings are immutable.
My underlying goal was to create a IEnumerable<string> Split(this String, Char)
extension method that allocates no additional memory.
Each string has to have it's own string data, with the way that the String class is implemented.
You can make your own SubString structure that uses part of a string:
You can flesh it out with other methods like comparison that is also possible to do without extracting the string.
Not possible without poking around inside .net using String classes. You would have to pass around references to an array which was mutable and make sure no one screwed up.
.Net will create a new string every time you ask it to. Only exception to this is interned strings which are created by the compiler (and can be done by you) which are placed into memory once and then pointers are established to the string for memory and performance reasons.
Adding to the point that Strings are immutable, you should be that the following snippet will generate multiple String instances in memory.
s1+s2 => new string instance (temp1)
temp1 + s3 => new string instance (temp2)
res is a reference to temp2.
Because strings are immutable in .NET, every string operation that results in a new string object will allocate a new block of memory for the string contents.
In theory, it could be possible to reuse the memory when extracting a substring, but that would make garbage collection very complicated: what if the original string is garbage-collected? What would happen to the substring that shares a piece of it?
Of course, nothing prevents the .NET BCL team to change this behavior in future versions of .NET. It wouldn't have any impact on existing code.
One reason why most languages with immutable strings create new substrings rather than refer into existing strings is because this will interfere with garbage collecting those strings later.
What happens if a string is used for its substring, but then the larger string becomes unreachable (except through the substring). The larger string will be uncollectable, because that would invalidate the substring. What seemed like a good way to save memory in the short term becomes a memory leak in the long term.