A String is a reference type even though it has most of the characteristics of a value type such as being immutable and having == overloaded to compare the text rather than making sure they reference the same object.
Why isn't string just a value type then?
Strings aren't value types since they can be huge, and need to be stored on the heap. Value types are (in all implementations of the CLR as of yet) stored on the stack. Stack allocating strings would break all sorts of things: the stack is only 1MB for 32-bit and 4MB for 64-bit, you'd have to box each string, incurring a copy penalty, you couldn't intern strings, and memory usage would balloon, etc...
(Edit: Added clarification about value type storage being an implementation detail, which leads to this situation where we have a type with value sematics not inheriting from System.ValueType. Thanks Ben.)
In a very simple words any value who has a definite size can be treated as a value type.
Isn't just as simple as Strings are made up of characters arrays. I look at strings as character arrays[]. Therefore they are on the heap because the reference memory location is stored on the stack and points to the beginning of the array's memory location on the heap. The string size is not known before it is allocated ...perfect for the heap.
That is why a string is really immutable because when you change it even if it is of the same size the compiler doesn't know that and has to allocate a new array and assign characters to the positions in the array. It makes sense if you think of strings as a way that languages protect you from having to allocate memory on the fly (read C like programming)
This is a late answer to an old question, but all other answers are missing the point, which is that .NET did not have generics until .NET 2.0 in 2005.
String
is a reference type instead of a value type because it was of crucial importance for Microsoft to ensure that strings could be stored in the most efficient way in non-generic collections, such asSystem.Collection.ArrayList
.Storing a value-type in a non-generic collection requires a special conversion to the type
object
which is called boxing. When the CLR boxes a value type, it wraps the value inside aSystem.Object
and stores it on the managed heap.Reading the value from the collection requires the inverse operation which is called unboxing.
Both boxing and unboxing have non-negligible cost: boxing requires an additional allocation, unboxing requires type checking.
Some answers claim incorrectly that
string
could never have been implemented as a value type because its size is variable. Actually it is easy to implement string as a fixed-length data structure using a Small String Optimization strategy: strings would be stored in memory directly as a sequence of Unicode characters except for large strings that would be stored as a pointer to an external buffer. Both representations can be designed to have the same fixed length, i.e. the size of a pointer.If generics had existed from day one I guess having string as a value type would probably have been a better solution, with simpler semantics, better memory usage and better cache locality. A
List<string>
containing only small strings could have been a single contiguous block of memory.At the risk of getting yet another mysterious down-vote...the fact that many mention the stack and memory with respect to value types and primitive types is because they must fit into a register in the microprocessor. You cannot push or pop something to/from the stack if it takes more bits than a register has....the instructions are, for example "pop eax" -- because eax is 32 bits wide on a 32-bit system.
Floating-point primitive types are handled by the FPU, which is 80 bits wide.
This was all decided long before there was an OOP language to obfuscate the definition of primitive type and I assume that value type is a term that has been created specifically for OOP languages.
Actually strings have very few resemblances to value types. For starters, not all value types are immutable, you can change the value of an Int32 all you want and it it would still be the same address on the stack.
Strings are immutable for a very good reason, it has nothing to do with it being a reference type, but has a lot to do with memory management. It's just more efficient to create a new object when string size changes than to shift things around on the managed heap. I think you're mixing together value/reference types and immutable objects concepts.
As far as "==": Like you said "==" is an operator overload, and again it was implemented for a very good reason to make framework more useful when working with strings.