I read that GC (Garbage Collectors) moves data in Heap for performance reasons, which I don't quite understand why since it is random access memory, maybe for better sequential access but I wonder if references in Stack get updated when such a move occurs in Heap. But maybe the offset address remains the same but other parts of data get moved by Garbage Collectors, I am not sure though.
I think this question pertains to implementation detail since not all garbage collectors may perform such optimization or they may do it but not update references (if it is a common practice among garbage collector implementations). But I would like to get some overall answer specific to CLR (Common Language Runtime) garbage collectors though.
And also I was reading Eric Lippert's "References are not addresses" article here, and the following paragraph confused me little bit:
If you think of a reference is actually being an opaque GC handle then it becomes clear that to find the address associated with the handle you have to somehow "fix" the object. You have to tell the GC "until further notice, the object with this handle must not be moved in memory, because someone might have an interior pointer to it". (There are various ways to do that which are beyond the scope of this screed.)
It sounds like for reference types, we don't want data to be moved. Then what else we store in the heap, which we can move around for performance optimization? Maybe type information we store there? By the way, in case you wonder what that article is about, then Eric Lippert is comparing references to pointers little bit and try to explain how it may be wrong to say that references are just addresses even though it is how C# implements it.
And also, if any of my assumptions above is wrong, please correct me.
Yes, references get updated during a garbage collection. Necessarily so, objects are moved when the heap is compacted. Compacting serves two major purposes:
In spite of Eric's didactic, an object reference really is just an address. A pointer, exactly the same kind you'd use in a C or C++ program. Very efficient, necessarily so. And all the GC has to do after moving an object is update the address stored in that pointer to the moved object. The CLR also permits allocating handles to objects, extra references. Exposed as the GCHandle type in .NET, but only necessary if the GC needs help determining if an object should stay alive or should not be moved. Only relevant if you interop with unmanaged code.
What is not so simple is finding that pointer back. The CLR is heavily invested in ensuring that can be done reliably and efficiently. Such pointers can be stored in many different places. The easier ones to find back are object references stored in a field of an object, a static variable or a GCHandle. The hard ones are pointers stored on the processor stack or a CPU register. Happens for method arguments and local variables for example.
One guarantee that the CLR needs to provide to make that happen is that the GC can always reliably walk the stack of a thread. So it can find local variables back that are stored in a stack frame. Then it needs to know where to look in such a stack frame, that's the job of the JIT compiler. When it compiles a method, it doesn't just generate the machine code for the method, it also builds a table that describes where those pointers are stored. You'll find more details about that in this post.
Looking at C++\CLI In Action, there's a section about interior pointers vs pinning pointers:
From that, you can conclude that reference types do move in the heap and their addresses do change. After the Mark and Sweep phase, the objects get compacted inside the heap, thus actually moving to new addresses. The CLR is responsible to keep track of the actual storage location and update those interior pointers using an internal table, making sure that when accessed, it still points to the valid location of the object.
There's an example taken from here:
Which is explained: