How does the GC update references after compaction

2019-01-19 11:50发布

The .NET Garbage Collector collects objects (reclaims their memory) and also performs memory compaction (to keep memory fragmentation to minimum).

I am wondering, since an application may have many references to objects, how does the GC (or the CLR) manage these references to objects, when the object's address changes due to compaction being made by the GC.

3条回答
Deceive 欺骗
2楼-- · 2019-01-19 12:17

For simplicity, I'll assume a stop-the-world GC in which no objects are pinned, every object gets scanned and relocated on every GC cycle, and none of the destinations overlap any of the sources. In actuality, the .NET GC is a bit more complicated, but this should give a good feel for how things work.

Each time a reference is examined, there are three possibilities:

  1. It's null. In that case, no action is required.

  2. It identifies an object whose header says it's something other than a relocation marker (a special kind of object described below). In that case, move the object to a new location and replace the original object with a three-word relocation marker containing the new location, the old location of the object which contains the just-observed reference to the present object, and the offset within that object. Then start scanning the new object (the system can forget about the object that was being scanned for the moment, since it just recorded its address).

  3. It identifies an object whose header says it's a relocation marker. In that case, update the reference being scanned to reflect the new address.

Once the system finishes scanning the present object, it can look at its old location to find out what it was doing before it started scanning the present object.

Once an object has been relocated, the former contents of its first three words will be available at its new location and will no longer be needed at the old one. Because the offset into an object will always be a multiple of four, and individual objects are limited to 2GB each, only a fraction of all possible 32-bit values would be needed to hold all possible offsets. Provided that at least one word in an object's header has at least 2^29 values it can never hold for anything other than an object-relocation marker, and provided every object is allocated at least twelve bytes, it's possible for object scanning to handle any depth of tree without requiring any depth-dependent storage outside the space occupied by old copies of objects whose content is no longer needed.

查看更多
叛逆
3楼-- · 2019-01-19 12:20

The concept is simple enough, the garbage collector simply updates any object references and re-points them to the moved object.

Implementation is a bit trickier, there is no real difference between native and managed code, they are both machine code. And there's nothing special about an object reference, it is just a pointer at runtime. What's needed is a reliable way for the collector to find these pointers back and recognize them as the kind that reference a managed object. Not just to update them when the pointed-to object gets moved while compacting, also to recognize live references that ensure that an object does not get collected too soon.

That's simple for any object references that are stored in class objects that are stored on the GC heap, the CLR knows the layout of the object and which fields store a pointer. It is not so simple for object references stored on the stack or in a cpu register. Like local variables and method arguments.

The key property of executing managed code which makes it distinct from native code is that the CLR can reliably iterate the stack frames owned by managed code. Done by restricting the kind of code used to setup a stack frame. This is not typically possible in native code, the "frame pointer omission" optimization option is particularly nasty.

Stack frame walking first of all lets it finds object references stored on the stack. And lets it know that the thread is currently executing managed code so that the cpu registers should be checked for references as well. A transition from managed code to native code involves writing a special "cookie" on the stack that the collector recognizes. So it knows that any subsequent stack frames should not be checked because they'll contain random pointer values that don't ever reference a managed object.

You can see this back in the debugger when you enable unmanaged code debugging. Look at the Call Stack window and note the [Native to Managed Transition] and [Managed to Native Transition] annotations. That's the debugger recognizing those cookies. Important for it as well since it needs to know whether or not the Locals window can display anything meaningful. The stack walk is also exposed in the framework, note the StackTrace and StackFrame classes. And it is very important for sandboxing, Code Access Security (CAS) performs stack walks.

查看更多
做个烂人
4楼-- · 2019-01-19 12:25

Garbage collection

Every application has a set of roots. Roots identify storage locations, which refer to objects on the managed heap or to objects that are set to null. For example, all the global and static object pointers in an application are considered part of the application's roots. In addition, any local variable/parameter object pointers on a thread's stack are considered part of the application's roots. Finally, any CPU registers containing pointers to objects in the managed heap are also considered part of the application's roots. The list of active roots is maintained by the just-in-time (JIT) compiler and common language runtime, and is made accessible to the garbage collector's algorithm.

When the garbage collector starts running, it makes the assumption that all objects in the heap are garbage. In other words, it assumes that none of the application's roots refer to any objects in the heap. Now, the garbage collector starts walking the roots and building a graph of all objects reachable from the roots. For example, the garbage collector may locate a global variable that points to an object in the heap.

Once this part of the graph is complete, the garbage collector checks the next root and walks the objects again. As the garbage collector walks from object to object, if it attempts to add an object to the graph that it previously added, then the garbage collector can stop walking down that path. This serves two purposes. First, it helps performance significantly since it doesn't walk through a set of objects more than once. Second, it prevents infinite loops should you have any circular linked lists of objects.

Once all the roots have been checked, the garbage collector's graph contains the set of all objects that are somehow reachable from the application's roots; any objects that are not in the graph are not accessible by the application, and are therefore considered garbage. The garbage collector now walks through the heap linearly, looking for contiguous blocks of garbage objects (now considered free space). The garbage collector then shifts the non-garbage objects down in memory (using the standard memcpy function that you've known for years), removing all of the gaps in the heap. Of course, moving the objects in memory invalidates all pointers to the objects. So the garbage collector must modify the application's roots so that the pointers point to the objects' new locations. In addition, if any object contains a pointer to another object, the garbage collector is responsible for correcting these pointers as well.

C# fixed statement

The fixed statement sets a pointer to a managed variable and "pins" that variable during the execution of statement. Without fixed, pointers to movable managed variables would be of little use since garbage collection could relocate the variables unpredictably. The C# compiler only lets you assign a pointer to a managed variable in a fixed statement.

Garbage Collection: Automatic Memory Management in the Microsoft .NET Framework

fixed Statement (C# Reference)

查看更多
登录 后发表回答