I am reading some materials about garbage collection in Java in order to get to know more deeply what really happens in GC process.
I came across on the mechanism called "card table". I've Googled for it and haven't found comprehensive information. Most of explanations are rather shallow and describes it like some magic.
My question is: How card table and write barrier works? What is marked in card tables? How then garbage collector knows that particular object is referenced by another object persisted in older generation.
I would like to have some practical imagination about that mechanism, like I was supposed to prepare some simulation.
For any one who is looking for a simple answer:
In JVM, the memory space of objects is broken down into two spaces:
The idea is that, once an object survives a few garbage collection, it is more likely to survive for a long time. So, objects that survive garbage collection for more than a threshold, will be promoted to old generation. The garbage collector runs more frequently in young generation and less frequently in old generation. This is because most objects live for a very short time.
We use generational garbage collection to avoid scanning of the whole memory space (like Mark and Sweep approach). In JVM we have a minor garbage collection which is when GC runs inside the young generation and a major garbage collection (or full GC) which encompasses garbage collection of both young and old generations.
When doing minor garbage collection, we follow every reference from the live roots to the objects in the young generation, and mark those objects as live, which excludes them from the garbage collection process. The problem is that there may be some references from the objects in the old generations to the objects in young generations, which should be considered by GC, meaning those objects in young generation which are referenced by objects in old generation should also be marked as live and excluded from the garbage collection process.
One approach to solve this problem is to scan all objects in the old generation and find their references to young objects. But this approach is in contradiction with the idea of generational garbage collectors. (Why we broke down our memory into multiple generations in the first place?)
Another approach is using write barriers and card table. When an object in old generation writes/updates a reference to an object in the young generation, this action goes through something called write barrier. When JVM sees these write barriers, it updates the corresponding entry in the card table. Card table is a table, which each one of it's entry corresponds to 512 bytes of memory. You can think of it as an array containing
0
and1
items. A1
entry means there is an object in the corresponding area of the memory which contains references to objects in young generation.Now, when minor garbage collection is happening, first every reference from the live roots to young objects are followed and the referenced objects in young generation will be marked as live. Then, instead of scanning all of the old object to find references to the young objects, the card table is scanned. If GC finds any marked area in the card table, it loads the corresponding object and follows it's references to young objects and marks them as live.
I don't know whether you found some exceptionally bad description or whether you expect too many details, I've been quite satisfied with the explanations I've seen. If the descriptions are brief and sound simplistic, that's because it really is a rather simple mechanism.
As you apparently already know, a generational garbage collector needs to be able to enumerate old objects that refer to young objects. It would be correct to scan all old objects, but that destroys the advantages of the generational approach, so you have to narrow it down. Regardless of how you do that, you need a write barrier - a piece of code executed whenever a member variable (of a reference type) is assigned/written to. If the new reference points to a young object and it's stored in an old object, the write barrier records that fact for the garbage collect. The difference lies in how it's recorded. There are exact schemes using so-called remembered sets, a collection of every old object that has (had at some point) a reference to a young object. As you can imagine, this takes quite a bit of space.
The card table is a trade-off: Instead of telling you which objects exactly contains young pointers (or at least did at some point), it groups objects into fixed-sized buckets and tracks which buckets contain objects with young pointers. This, of course, reduces space usage. For correctness, it doesn't really matter how you bucket the objects, as long as you're consistent about it. For efficiency, you just group them by their memory address (because you have that available for free), divided by some larger power of two (to make the division a cheap bitwise operation).
Also, instead of maintaining an explicit list of buckets, you reserve some space for each possible bucket up-front. Specifically, there is an array of N bits or bytes, where N is the number of buckets, so that the
i
th value is 0 if thei
th bucket contains no young pointers, or 1 if it does contain young pointers. This is the card table proper. Typically this space is allocated and freed along with a large block of memory used as (part of) the heap. It may even be embedded in the start of the memory block, if it doesn't need to grow. Unless the entire address space is used as heap (which is very rare), the above formula gives numbers starting fromstart_of_memory_region >> K
instead of 0, so to get an index into the card table you have to subtract the start of the start address of the heap.In summary, when the write barrier finds that the statement
some_obj.field = other_obj;
stores a young pointer in an old object, it does this:Where
&old_obj
is the address of the object that now has a young pointer (which is already in a register because it was just determined to refer to an old object). During minor GC, the garbage collector looks at the card table to determine which heap regions to scan for young pointers:Sometime ago I have written an article explaining mechanics of young collection in HotSpot JVM. Understanding GC pauses in JVM, HotSpot's minor GC