Let's consider some very simple C# code:
static void Main(string[] args)
{
int i = 5;
string s = "ABC";
bool b = false;
}
Jeffrey Richter's "CLR via C#" (Chapter 14) states that "The String type is derived immediately from Object, making it a reference type, and therefore, String objects (its array of characters) always live in the heap, never on a thread's stack".
Also referring to strings, on an example in the book quite similar to the one above: "The newobj IL instruction constructs a new instance of an object. However, no newobj instruction appears in the IL code example. Instead, you see the special ldstr (load string) IL instruction, which constructs a String object by using a literal string obtained from metadata. This shows you that the common language runtime (CLR) does, in fact, have a special way of constructing literal String objects."
Looking at the IL code, this is clearly the case (only relevant part shown):
[...]
.locals init (
[0] int32,
[1] string,
[2] bool
)
// (no C# code)
IL_0000: nop
// int num = 5;
IL_0001: ldc.i4.5
IL_0002: stloc.0
// string text = "ABC";
IL_0003: ldstr "ABC"
IL_0008: stloc.1
// bool flag = false;
[...]
The ldstr
IL instruction ensures that "an object reference to a string is pushed onto the stack". Which makes sense - the instance of the string stays on the heap, and the reference to this object (its address) is stored by the variable on the stack.
Now let's set a breakpoint on the line following variable text
being declared, start debugging in Visual Studio and then switch to the Disassembly view. Relevant code follows (the full disassembled code is here):
017B0483 nop
int i = 5;
017B0484 mov dword ptr [ebp-40h],5
string s = "ABC";
017B048B mov eax,dword ptr ds:[429231Ch]
017B0491 mov dword ptr [ebp-44h],eax
bool b = false;
017B0494 xor edx,edx
017B0496 mov dword ptr [ebp-48h],edx
}
Looking specifically at the 2 assembly instructions handling the C# string
line, the first one moves the content of the virtual memory at 429231C
to the eax
register, and the second stores the respective content on the stack, where the s
variable lives.
Let's use WinDbg (x86, since the C# code is using the VS' default 32-bit target platform) to look at that specific address, by attaching to the process being debugged by VS, in a non-invasive mode. The content of 429231C
above should be a reference to the memory space where the string actually lives. Let's check:
The second command does yield a 41
, 42
and 43
in hex, which do represent A
, B
and C
in ASCII; however the order is not all right and might just be a coincidence. (1) It doesn't look as the assembly code for the string line does things right.
If we use VMMap to look at that address:
The original address 429231C
looks to be within the managed heap. But then (2) why would the content of an address on the heap be brought in as the reference contained within a stack variable, as the assembly code previously looked to indicate ?
The 2 questions I'm asking are (1) and (2). Despite the fact that everything makes sense to me right up to analyzing the IL code, things go downhill fast once I look at the disassembled code for that IL. I tend to think that I'm rather messing something up in my logic (most likely) or I'm hitting some sort of bug in the VS debugger (less likely).
Later Update: As very well pointed out by @madreflection and @Jester, endianness tripped me. The hex representation checks out all right. Only question (2) now remains.
Later Update 2: The comments have been quite insightful, and I think @madreflection puts it best - there's an additional level of indirection - and the reasons for doing this (stated in the comments) start to make sense to me now. A quick diagram is below. I've also checked that both addresses do indeed belong to the managed heap with VMMap.
Later Update 3: Corrected previous diagram.