They say that compacting garbage collectors are faster than traditional memory management because they only have to collect live objects, and by rearranging them in memory so everything's in one contiguous block, you end up with no heap fragmentation. But how can that be done quickly? It seems to me that that's equivalent to the bin-packing problem, which is NP-hard and can't be completed in a reasonable amount of time on a large dataset within the current limits of our knowledge about computation. What am I missing?
相关问题
- C# GC not freeing memory [duplicate]
- Finding k smallest elements in a min heap - worst-
- garbage collection best practices
- binary search tree path list
- High cost encryption but less cost decryption
相关文章
- What are the problems associated to Best First Sea
- Coin change DP solution to keep track of coins
- Algorithm for partially filling a polygonal mesh
- Robust polygon normal calculation
- Algorithm for maximizing coverage of rectangular a
- How to measure complexity of a string?
- Select unique/deduplication in SSE/AVX
- How to smooth the blocks of a 3D voxel world?
Compaction means moving objects in RAM so that some objects are removed (the dead objects, that the GC is supposed to reclaim) and all remaining objects become contiguous in RAM.
Most compacting GC work by allocating objects within a single, contiguous area obtained from the operating system. Compaction is then like removing the dead objects, and then pushing all remaining live objects "to the left", squeezing out the holes. If the GC works by compaction, then allocation is simply a matter of moving up the "end of allocated area" pointer. Synthetically, within the allocation area, there is a pointer such that the free area consists of the bytes after that pointer. To allocate space for an object, the pointer is simply moved up by the new object size. Occasionally, the GC decides that it is time to run, detects the dead object, squeezes the holes out, and thus lowers the allocation pointer.
The performance gains from a compacting GC come from several sources:
If the operating system refuses to give a single allocation area, instead yielding several blocks, then things become a bit more complex, and could begin to look like the bin-packing problem, because the compacting GC then has to decide in which block goes each live object. However, the complexity of bin-packing is about finding the "perfect" match in the general case; an approximate solution is already good enough for a memory allocator.
The algorithmic difficulty in a compaction algorithm is about updating all pointers, so that they point to the new object location. Through strict typing, the .NET virtual machine can unambiguously decide whether each word in RAM is a pointer or not, but updating all pointers efficiently without using too much extra RAM can be tricky. H.B.M. Jonkers has described a wonderfully smart algorithm for that, in "A Fast Garbage Compaction Algorithm" (Information Processing Letters, Volume 9, Number 1, 1979, pp. 26-30). I could not find a copy of that paper on the Vast Internet, but the algorithm is described in the "Garbage Collection" book by Jones and Lins (section 5.6). I warmly recommend that book to anyone who is interested in understanding garbage collectors. Jonkers' algorithm requires two linear passes on the live objects and turns out to be easy to implement (a few dozen lines of code, no more; the most difficult part is understanding why it works).
Additional complexity comes from generational collectors which try, most of the time, to leave most of the objects untouched, working preferentially with young objects only. Here, this means compacting only the end of the heap; full compaction being applied only rarely. The point here is that a full compaction, although linear, could still induce a noticeable pause. Generational GC tries to make such pauses rarer. There again, Jones' and Lins' book is a must-read.