Changed behavior of string.Empty (or System.String

2019-01-13 14:57发布

问题:

Short version:

The C# code

typeof(string).GetField("Empty").SetValue(null, "Hello world!");
Console.WriteLine(string.Empty);

when compiled and run, gives output "Hello world!" under .NET version 4.0 and earlier, but gives "" under .NET 4.5 and .NET 4.5.1.

How can a write to a field be ignored like that, or, who resets this field?

Longer version:

I have never really understood why the string.Empty field (also known as [mscorlib]System.String::Empty) is not const (aka. literal), see "Why isn't String.Empty a constant?". This means that, for example, in C# we can't use string.Empty in the following situations:

  • In a switch statement in the form case string.Empty:
  • As the default value of an optional parameter, like void M(string x = string.Empty) { }
  • When applying an attribute, like [SomeAttribute(string.Empty)]
  • Other situations where a compile-time constant is required

which has implications to the well-known "religious war" over whether to use string.Empty or "", see "In C#, should I use string.Empty or String.Empty or "" to intitialize a string?".

A couple of years ago I amused myself by setting Empty to some other string instance through reflection, and see how many parts of the BCL started behaving strangely because of it. It was quite many. And the change of the Empty reference seemed to persist for the complete life of the application. Now, the other day I tried to repeat that little stunt, but then using a .NET 4.5 machine, and I couldn't do it anymore.

(NB! If you have .NET 4.5 on your machine, probably your PowerShell still uses an older version of .NET, so try copy-pasting [String].GetField("Empty").SetValue($null, "Hello world!") into PowerShell to see some effects of changing this reference.)

When I tried to search for a reason for this, I stumbled upon the interesting thread "What's the cause of this FatalExecutionEngineError in .NET 4.5 beta?". In the accepted answer to that question, is it noted that through version 4.0, System.String had a static constructor .cctor in which the field Empty was set (in the C# source, that would probably just be a field initializer, of course), while in 4.5 no static constructor exists. In both versions, the field itself looks the same:

.field public static initonly string Empty

(as seen with IL DASM).

No other fields than String::Empty seems to be affected. As an example, I experimented with System.Diagnostics.Debugger::DefaultCategory. This case seems analogous: A sealed class containing a static readonly (static initonly) field of type string. But in this case it works fine to change the value (reference) through reflection.

Back to the question:

How is it possible, technically, that Empty doesn't seem to change (in 4.5) when I set the field? I have verified that the C# compiler does not "cheat" with the read, it outputs IL like:

ldsfld     string [mscorlib]System.String::Empty

so the actual field ought to be read.


Edit after bounty was put on my question: Note that the write operation (which needs reflection for sure, since the field is readonly (a.k.a. initonly in the IL)) actually works as expected. It is the read operation which is anomalous. If you read with reflection, as in typeof(string).GetField("Empty").GetValue(null), everything is normal (i.e. the change of value is seen). See comments below.

So the better question is: Why does this new version of the framework cheat when it reads this particular field?

回答1:

The difference lies in the JIT for the new release of .NET, which apparently optimizes references to String.Empty by inlining a reference to a particular String instance rather than load the value stored in the Empty field. This is justified under the definition of the init-only constraint in ECMA-335 Partition I §8.6.1.2, which can be interpreted to mean the value of the String.Empty field will not change after the String class is initialized.



回答2:

I don't have an answer, juste some hint, maybe.

The only difference I see between String::Empty and System.Diagnostics.Debugger::DefaultCategory is the first one is tagged with __DynamicallyInvokableAttribute.

I dont' known the meaning of this undocumented attribute. A question about this attribute has been asked on SO: What is the __DynamicallyInvokable attribute for?

I can only suppose that this attribute is catch by the runtime to do some caching ?



回答3:

Because it can.

The value of these system-defined initonly fields are global invariants for the .NET runtime. If these invariants are broken, there are no longer any guarantees whatsoever regarding the behavior.

In C++, we would probably have a rule designating this as causing undefined behavior. In .NET, it is also undefined behavior, simply by the absence of any rule saying what happens when System.String.Empty.Length > 0. The whole specification of all layers of .NET and C# describe the behavior when System.String.Empty.Length == 0 and a whole bunch of invariants also hold.

For more information about optimizations which vary between runtimes and the implications, see the answers to

  • What are the implications of asking Reflection APIs to overwrite System.String.Empty?