I'm reading on JVM tuning, and it occurred to me that the JVM keeps moving objects around when it does GC. But Java Objects have references to each other, which one would presume are implemented as pointers, but the JVM can't possibly go over the whole heap after every time it moved objects around, and update all the references; surely that would take for ever. So how does it resolve references, if the references do not change, but the physical location of the objects do?
I've read a lot about the JVM, but that was never explained, or even hinted at, anywhere.
[EDIT] My point is that references are one-way things. Going from the pointer to the pointed is "instantaneous", but going the other way around would require a full heap scan. While it is possible, it seems unlikely. If 10K objects survive a minor collection, how long would it take to do a full heap scan 10K times to update the references to those objects? There must be some kind of optimized algorithm or structure used.
Usually, collectors dont walk the entire heap. They identify live objects and traverse them.
For example, the copying collector in Hotspot, starts with roots and identifies all the live objects. Once the live objects are identified, they are copied to a new space on heap. In walking all the live objects, it does the required address modifications for the live objects.
Once this is done, all thats get left in the old space are dead objects and the objects that are already moved. This free space is reclaimed by GC and is used in future to move other live objects into it.
The time taken is proportional to number of live objects on heap.
I am not absolutely sure that this IS the way object references in the heap are managed, but I suspect that the object references that Java VM hands out to our programs are NOT the actual memory addresses but the internal JVM references that point to the actual address in JVM (HashMap or similar structure). I.e. all objects that refer to objectA will have references [NOT address] to objectA, when GC occurs JVM does NOT need to update the references in all these objects, just the actual changed address in it's own HashMap.
It sure does scan through the whole heap to detect the object that are no longer referenced by anybody and mark them as eligible to be collected and to place all active objects in a compact memory area to avoid fragmentation.
How it does it depends on the garbage collection algorithms used but it is a time consuming process indeed and that is a reason why Java (per se) can't be used in real time constraints
I'm no expert on GC myself, but as far as I know, that is more or less what it does. See e.g. this text:
( http://wiki.osdev.org/Garbage_collection#Copy_collectors , emphasis mine).
As to this "taking for ever" -- the main idea behind a copying (or moving) garbage collector is that only a small amount of objects will actually need to be moved, because most instances are already dead (i.e. most instances are very short-lived). So the number of objects that move is small, and hopefully the number of references pointing to them is also fairly small.
At any rate, the GC must build a list of object references anyway (to find out which objects are still referenced/alive and need to be copied), so it can probably reuse that list to update the references. So the only the updating is "extra work".
If you are really interested in how garbage collectors work, can I recommend Richard Jones' 2 books on Garbage Collection. Links / references are here. This isn't specifically about Java garbage collection.
(I have a copy of the older book, and the new one is on my shopping list.)
Here's a simple version of how a copying collector deals with this problem.
A copying collector works by copying objects from one space (the from-space) to another one (the to-space).
Specifically, the GC walks the graph of reachable objects within the "from" space, starting from each of the GC roots. Each time it finds a reference to a node (in an instance field, static field, stack frame, etc), it checks the object that the reference points to to see if it has been marked as visited.
If it is not yet marked, the GC does the following:
The result of this the reference to the to-space object.
If the object has been marked already, the GC looks up the forwarding address, and returns that.
The location (in to-space, or some GC root) where the GC got the reference from is then updated with the pointer to the object in to-space.
If you follow all of that, then you will see that the GC doesn't need to go looking for all of the places that hold a reference to a given moved object. Instead, it simply encounters all of the places in the traversal of the reachable objects. Of course, the GC does have to do that traversal, but there are various techniques to reduce the amount of traversing that needs to be done in each GC cycle.
If you haven't followed the above, then PLEASE go read one of the textbooks that I've recommended. They'll do a much better job of explaining it than I can do. You'll also find material on how other kinds of GC deal with this issue.
The Java HotSpot GCs are all copying collectors of one form or another. Things get a bit more complicated than my description above for parallel and concurrent collecting, but the "forwarding address" mechanism is common to all of them.
(There are not many published papers or other public documentation on HotSpot GCs, and most of the material that exists assumes that the reader has a good understanding of how modern garbage collectors work.)