How much extra memory does garbage collection requ

2019-03-18 07:30发布

问题:

I heard once that for a language to implement and run garbage collection correctly there is on average of 3x more memory required. I am not sure if this is assuming the application is small, large or either.

So i wanted to know if theres any research or actually numbers of garbage collection overhead. Also i want to say GC is a very nice feature.

回答1:

The amount of memory headroom you need depends on the allocation rate within your program. If you have a high allocation rate, you need more room for growth while the GC works.

The other factor is object lifetime. If your objects typically have a very short lifetime, then you may be able to manage with slightly less headroom with a generational collector.

There are plenty of research papers that may interest you. I'll edit a bit later to reference some.

Edit (January 2011):

I was thinking of a specific paper that I can't seem to find right now. The ones below are interesting and contain some relevant performance data. As a rule of thumb, you are usually ok with about twice as much memory available as your program residency. Some programs need more, but other programs will perform very well even in constrained environments. There are lots of variables that influence this, but allocation rate is the most important one.

  1. Immix: a mark-region garbage collector with space efficiency, fast collection, and mutator performance

  2. Myths and realities: the performance impact of garbage collection

    Edit (February 2013): This edit adds a balanced perspective on a paper cited, and also addresses objections raised by Tim Cooper.

  3. Quantifying the Performance of Garbage Collection vs. Explicit Memory Management, as noted by Natan Yellin, is actually the reference I was first trying to remember back in January 2011. However, I don't think the interpretation Natan has offered is correct. That study does not compare GC against conventional manual memory management. Rather it compares GC against an oracle which does perfect explicit releases. In otherwords, it leaves us not know how well conventional manual memory management compares to the magic oracle. It is also very hard to find this out because the source programs are either written with GC in mind, or with manual memory management in mind. So any benchmark retains in inherent bias.

Following Tim Cooper's objections, I'd like to clarify my position on the topic of memory headroom. I do this mainly for posterity, as I believe Stack Overflow answers should serve as a long-term resource for many people.

There are many memory regions in a typical GC system, but three abstract kinds are:

  • Allocated space (contains live, dead, and untraced objects)
  • Reserved space (from which new objects are allocated)
  • Working region (long-term and short-term GC data structures)

What is headroom anyway? Headroom is the minimum amount of reserved space needed to maintain a desired level of performance. I believe that is what the OP was asking about. You can also think of the headroom as memory additional to the actual program residency (maximum live memory) neccessary for good performance.

Yes -- increasing the headroom can delay garbage collection and increase throughput. That is important for offline non-critical operations.

In reality most problem domains require a realtime solution. There are two kinds of realtime, and they are very different:

  • hard-realtime concerns worst case delay (for mission critical systems) -- a late response from the allocator is an error.
  • soft-realtime concerns either average or median delay -- a late response from the allocator is ok, but shouldn't happen often.

Most state of the art garbage collectors aim for soft-realtime, which is good for desktop applications as well as for servers that deliver services on demand. If one eliminates realtime as a requirement, one might as well use a stop-the-world garbage collector in which headroom begins to lose meaning. (Note: applications with predominantly short-lived objects and a high allocation rate may be an exception, because the survival rate is low.)

Now suppose that we are writing an application that has soft-realtime requirements. For simplicity let's suppose that the GC runs concurrently on a dedicated processor. Suppose the program has the following artificial properties:

  • mean residency: 1000 KB
  • reserved headroom: 100 KB
  • GC cycle duration: 1000 ms

And:

  • allocation rate A: 100 KB/s
  • allocation rate B: 200 KB/s

Now we might see the following timeline of events with allocation rate A:

  • T+0000 ms: GC cycle starts, 100 KB available for allocations, 1000 KB already allocation
  • T+1000 ms:
    • 0 KB free in reserved space, 1100 KB allocated
    • GC cycle ends, 100 KB released
    • 100 KB free in reserve, 1000 KB allocated
  • T+2000 ms: same as above

The timeline of events with allocation rate B is different:

  • T+0000 ms: GC cycle starts, 100 KB available for allocations, 1000 KB already allocation
  • T+0500 ms:
    • 0 KB free in reserved space, 1100 KB allocated
    • either
      • delay until end of GC cycle (bad, but sometimes mandatory), or
      • increase reserved size to 200 KB, with 100 KB free (assumed here)
  • T+1000 ms:
    • 0 KB free in reserved space, 1200 KB allocated
    • GC cycle ends, 200 KB released
    • 200 KB free in reserve, 1000 KB allocated
  • T+2000 ms:
    • 0 KB free in reserved space, 1200 KB allocated
    • GC cycle ends, 200 KB released
    • 200 KB free in reserve, 1000 KB allocated

Notice how the allocation rate directly impacts the size of the headroom required? With allocation rate B, we require twice the headroom to prevent pauses and maintain the same level of performance.

This was a very simplified example designed to illustrate only one idea. There are plenty of other factors, but it does show what was intended. Keep in mind the other major factor I mentioned: average object lifetime. Short lifetimes cause low survival rates, which work together with the allocation rate to influence the amount of memory required to maintain a given level of performance.

In short, one cannot make general claims about the headroom required without knowing and understanding the characteristics of the application.



回答2:

According to the 2005 study Quantifying the Performance of Garbage Collection vs. Explicit Memory Management (PDF), generational garbage collectors need 5 times the memory to achieve equal performance. The emphasis below is mine:

We compare explicit memory management to both copying and non-copying garbage collectors across a range of benchmarks, and include real (non-simulated) runs that validate our results. These results quantify the time-space tradeoff of garbage collection: with five times as much memory, an Appel-style generational garbage collector with a non-copying mature space matches the performance of explicit memory management. With only three times as much memory, it runs on average 17% slower than explicit memory management. However, with only twice as much memory, garbage collection degrades performance by nearly 70%. When physical memory is scarce, paging causes garbage collection to run an order of magnitude slower than explicit memory management.



回答3:

I hope the original author clearly marked what they regard as correct usage of garbage collection and the context of their claim.

The overhead certainly depends on many factors; e.g., the overhead is larger if you run your garbage collector less frequently; a copying garbage collector has a higher overhead than a mark and sweep collector; and it is much easier to write a garbage collector with lower overhead in a single-threaded application than in the multi-threaded world, especially for anything that moves objects around (copying and/or compacting gc).



回答4:

So i wanted to know if theres any research or actually numbers of garbage collection overhead.

Almost 10 years ago I studied two equivalent programs I had written in C++ using the STL (GCC on Linux) and in OCaml using its garbage collector. I found that the C++ used 2x more memory on average. I tried to improve it by writing custom STL allocators but was never able to match the memory footprint of the OCaml.

Furthermore, GCs typically do a lot of compaction which further reduces the memory footprint. So I would challenge the assumption that there is a memory overhead compared to typical unmanaged code (e.g. C++ using what are now the standard library collections).