I was curious how the StringBuilder class is implemented internally, so I decided to check out Mono's source code and compare it with Reflector's disassembled code of the Microsoft's implementation. Essentially, Microsoft's implementation uses char[]
to store a string representation internally, and a bunch of unsafe methods to manipulate it. This is straightforward and did not raise any questions. But I was confused, when I found that Mono uses a string inside StringBuilder:
private int _length;
private string _str;
The first thought was: "What a senseless StringBuilder". But then I figured out that it is possible to mutate a string using pointers:
public StringBuilder Append (string value)
{
// ...
String.CharCopy (_str, _length, value, 0, value.Length);
}
internal static unsafe void CharCopy (char *dest, char *src, int count)
{
// ...
((short*)dest) [0] = ((short*)src) [0]; dest++; src++;
}
I used to program in C/C++ a little, so I can't say that this code confused me much, but I thought that strings are completely immutable (i.e there is absolutely no way to mutate it). So the actual questions are:
- Can I create a completely immutable type?
- Is there any reason to use such code apart from performance concerns? (unsafe code to change immutable types)
- Are strings then inherently thread-safe or not?
There is no black magic at work here. The string class is immutable simply because it doesn't have any public fields, properties or methods that allows you to modify the internal string. Any method that mutates a string returns a new string instance. You of course can do this as well with your own classes.
You can create a type where the CLR enforces immutability on it. You can then use "unsafe" to turn off the CLR enforcement mechanisms. That's why "unsafe" is called "unsafe" - because it turns off the safety system. In unsafe code every single byte of memory in the process can be writable if you try hard enough, including both the immutable bytes and the code in the CLR which enforces immutability.
You can also use Reflection to break immutability. Both Reflection and unsafe code require an extremely high level of trust to be granted.
Sure, there are lots of reasons to use immutable data structures. Immutable data structures rock. Some good reasons to use immutable data structures:
The fact that the answer to a question about an immutable type stays true forever has security implications. Suppose you have code like this:
If Bar is a mutable type then there is a race condition here; bar could be made unsafe on another thread after the check but before something dangerous happens. If Bar is an immutable type then the answer to the question stays the same throughout, which is much safer. (Imagine if you could mutate a string containing a path after the security check but before the file was opened, for example.)
methods which take immutable data structures as their arguments and return them as their results and perform no side effects are called "pure methods". Pure methods can be memoized, which trades increased memory use for increased speed, often enormously increased speed.
immutable data structures can often be used on multiple threads simultaneously without locking. Locking is there to prevent creation of inconsistent state of an object in the face of a mutation, but immutable objects don't have mutations. (Some so-called immutable data structures are logically immutable but actually do mutations inside themselves; imagine for example a lookup table which does not change its contents, but does reorganize its internal structure if it can deduce what the next query is likely to be. Such a data structure would not be automatically threadsafe.)
immutable data structures that efficiently re-use their internal parts when a new structure is built from an old one make it easy to "take a snapshot" of the state of a program without wasting lots of memory. That makes undo-redo operations trivial to implement. It makes it easier to write debugging tools that can show you how you got to a particular program state.
and so on.
If everyone plays by the rules, they are. If someone uses unsafe code or private reflection then there is no rule enforcement anymore. You have to trust that if someone is using high-privilege code then they are doing so correctly and not mutating a string. Use your power to run unsafe code only for good; with great power comes great responsibility.
That is a strange question. Remember, locks are co-operative. Locks only work if everyone accessing a particular object agrees upon the locking strategy that must be used.
You have to use locks if the agreed-upon locking strategy for accessing particular object in a particular storage location is to use locks. If that isn't the agreed-upon locking strategy then using locks is pointless; you're carefully locking and unlocking the front door while someone else is walking in the open back door.
If you have a string which you know is being mutated by unsafe code, and you don't want to see inconsistent partial mutations, and the code which is doing the unsafe mutation documents that it takes out a particular lock during that mutation, then yes, you need to use locks when accessing that string. But this situation is very rare; ideally no one would use unsafe code to manipulate a string accessible by other code on another thread, because doing so is an incredibly bad idea. That's why we require that code that does so is fully trusted. And that's why we require that the C# source code for such a function wave a big red flag that says "this code is unsafe, review it carefully!"
Yes. Have a constructor to set private fields, get only properties and no methods.
One example: such types don't require locks to be safely used from multiple concurrent threads, this makes correct code easier to write (no locks to get wrong).
Additional: it is always possible for sufficiently privileged code to bypass .NET protections: either reflection to read and write to private fields, or unsafe code to directly manipulate an object's memory.
This is true outside of .NET, a privileged process (i.e. with a process or thread token with one of the "God" privileges, e.g. Take Ownership enabled) can break into any other process load dlls, inject threads running arbitrary code, read or write memory (including overriding execute prevention etc.). The integrity of the system is only as strong as the cooperation of the owner of the system.
You can read these posts Immutable types: understand their benefits and use them
and Manage states in a multi-threaded environment without the synchronization pain
Also the tool NDepend comes with some facilities to cop with immutable types and pure methods.
There is no completely immutable type, a class that is immutable is that because it doesn't allow any outside code to alter it. Using reflection or unsafe code you can still change it's values.
You can use the
readonly
keyword to create an immutable variable, but that works only for value types. If you use it on a reference type, it's only the reference that is protected, not the object that it points to.There are several reasons for immutable types, like performance and robustness.
The fact that strings are known to be immutable (outside the
StringBuilder
) means that the compiler can make optimisations based on that. The compiler never has to produce code to copy a string to protect it from being changed when it's passed as a parameter.Objects created from immutable types can also be safely passed between threads. As they can't be changed, there is no risk for different threads changing them at the same time, so there is no need to synchonise access to them.
Immutable types can be used to avoid coding errors. If you know that a value should not be changed, it's generally a good idea to make sure that it can't be changed by mistake.
If you go unsafe, it is possible to mutate strings in C# too (IIRC).