In C#, why is String a reference type that behaves

2018-12-31 05:08发布

A String is a reference type even though it has most of the characteristics of a value type such as being immutable and having == overloaded to compare the text rather than making sure they reference the same object.

Why isn't string just a value type then?

12条回答
无色无味的生活
2楼-- · 2018-12-31 05:55

The distinction between reference types and value types are basically a performance tradeoff in the design of the language. Reference types have some overhead on construction and destruction and garbage collection, because they are created on the heap. Value types on the other hand have overhead on method calls (if the data size is larger than a pointer), because the whole object is copied rather than just a pointer. Because strings can be (and typically are) much larger than the size of a pointer, they are designed as reference types. Also, as Servy pointed out, the size of a value type must be known at compile time, which is not always the case for strings.

The question of mutability is a separate issue. Both reference types and value types can be either mutable or immutable. Value types are typically immutable though, since the semantics for mutable value types can be confusing.

Reference types are generally mutable, but can be designed as immutable if it makes sense. Strings are defined as immutable because it makes certain optimizations possible. For example, if the same string literal occurs multiple times in the same program (which is quite common), the compiler can reuse the same object.

So why is "==" overloaded to compare strings by text? Because it is the most useful semantics. If two strings are equal by text, they may or may not be the same object reference due to the optimizations. So comparing references are pretty useless, while comparing text are almost always what you want.

Speaking more generally, Strings has what is termed value semantics. This is a more general concept than value types, which is a C# specific implementation detail. Value types have value semantics, but reference types may also have value semantics. When a type have value semantics, you can't really tell if the underlying implementation is a reference type or value type, so you can consider that an implementation detail.

查看更多
路过你的时光
3楼-- · 2018-12-31 05:56

This is a late answer to an old question, but all other answers are missing the point, which is that .NET did not have generics until .NET 2.0 in 2005.

String is a reference type instead of a value type because it was of crucial importance for Microsoft to ensure that strings could be stored in the most efficient way in non-generic collections, such as System.Collection.ArrayList.

Storing a value-type in a non-generic collection requires a special conversion to the type object which is called boxing. When the CLR boxes a value type, it wraps the value inside a System.Object and stores it on the managed heap.

Reading the value from the collection requires the inverse operation which is called unboxing.

Both boxing and unboxing have non-negligible cost: boxing requires an additional allocation, unboxing requires type checking.

Some answers claim incorrectly that string could never have been implemented as a value type because its size is variable. Actually it is easy to implement string as a fixed-length data structure using a Small String Optimization strategy: strings would be stored in memory directly as a sequence of Unicode characters except for large strings that would be stored as a pointer to an external buffer. Both representations can be designed to have the same fixed length, i.e. the size of a pointer.

If generics had existed from day one I guess having string as a value type would probably have been a better solution, with simpler semantics, better memory usage and better cache locality. A List<string> containing only small strings could have been a single contiguous block of memory.

查看更多
低头抚发
4楼-- · 2018-12-31 06:01

Strings aren't value types since they can be huge, and need to be stored on the heap. Value types are (in all implementations of the CLR as of yet) stored on the stack. Stack allocating strings would break all sorts of things: the stack is only 1MB for 32-bit and 4MB for 64-bit, you'd have to box each string, incurring a copy penalty, you couldn't intern strings, and memory usage would balloon, etc...

(Edit: Added clarification about value type storage being an implementation detail, which leads to this situation where we have a type with value sematics not inheriting from System.ValueType. Thanks Ben.)

查看更多
临风纵饮
5楼-- · 2018-12-31 06:01

Also, the way strings are implemented (different for each platform) and when you start stitching them together. Like using a StringBuilder. It allocats a buffer for you to copy into, once you reach the end, it allocates even more memory for you, in the hopes that if you do a large concatenation performance won't be hindered.

Maybe Jon Skeet can help up out here?

查看更多
千与千寻千般痛.
6楼-- · 2018-12-31 06:04

How can you tell string is a reference type? I'm not sure that it matters how it is implemented. Strings in C# are immutable precisely so that you don't have to worry about this issue.

查看更多
与风俱净
7楼-- · 2018-12-31 06:05

Isn't just as simple as Strings are made up of characters arrays. I look at strings as character arrays[]. Therefore they are on the heap because the reference memory location is stored on the stack and points to the beginning of the array's memory location on the heap. The string size is not known before it is allocated ...perfect for the heap.

That is why a string is really immutable because when you change it even if it is of the same size the compiler doesn't know that and has to allocate a new array and assign characters to the positions in the array. It makes sense if you think of strings as a way that languages protect you from having to allocate memory on the fly (read C like programming)

查看更多
登录 后发表回答