When NOT to use garbage collection?

2019-03-18 18:17发布

站内文章 / 移动开发

37 0

神经病院院长

女 | 书童

私信

可以将文章内容翻译成中文,广告屏蔽插件可能会导致该功能失效(如失效，请关闭广告屏蔽插件后再试):

问题:

The obvious cases for not using garbage collection are hard realtime, severely limited memory, and wanting to do bit twiddling with pointers. Are there any other, less discussed, good reasons why someone would prefer manual memory management instead of GC?

回答1:

It IS possible to use garbage collection with hard real time, if you have fully incremental garbage collector with bounded execution time per byte of allocated memory, so, crazily enough, it is NOT necessarily a reason not to use garbage collection :)

One fundamental problem with garbage collection, though, is that it is difficult to estimate and manage the actual size of the working set in memory, because garbage collector can free your memory only delayedly. So, yes, when memory is restricted, garbage collection might not be a good choice.

Another problem with garbage collection is that it sometimes interferes with freeing other resources such as file descriptors, window handles, etc., because, again, the garbage collector might free those resources only delayedly, causing resource starvation.

Garbage collection can also cause cache trashing, because the memory is not necessarily allocated in a local fashion. For example, stack allocated memory is much more cache-friendly than heap-allocated short-lived objects.

Finally, garbage collection of course consumes CPU time :) So if you can code manually memory management you can save the CPU cycles the garbage collector would consume :)

回答2:

Temporary insanity?

Actually the only case I know of that you haven't covered is so-called "lifetime-based memory management", which sometimes goes under the name "pools" or "arenas" or "regions". The idea is that you're going to allocate a ton of objects, probably small, and they're all going to die at once. So you have a cheap allocator and then you recover all the objects in a single free operation.

These days there are program analyses that will enable a compiler to do this for you, but if you're writing C code, you do it by hand. There's an implementation with examples in Dave Hanson's C Interfaces and Implementations, and it's used in Fraser and Hanson's lcc compiler, which is written in C without a garbage collector.

回答3:

When programming for embedded devices with limited resources. iPhone, for instance, uses reference-counting.

Or when programming something that is extremely intensive on your computer. SETI@Home and video games come to mind.

I would advise that you don't manage your own memory unless the situation dictates it's really necessary. Somebody famous once said that code is twice as hard to debug as it is to write. Well, memory management is hard enough in the first place. :)

回答4:

The only reason NOT to use Garbage Collection for resource management is if you want to use RAII ala C++, but as it applies purely to memory, even then it's a reasonable idea to use it. (Note: It's still possible to do so, with gotcha's related to non-deterministic finalization).

That said, using Garbage Collection can use more memory than is strictly needed, so in severely memory constrained areas where one can not even spare the memory for managing the GC routines (and the code), then that's a good reason not to use it, too.

Additionally, if the language you use doesn't contain GC by default, such as C++, and you like to use RAII, then that's a reasonable reason too, though for some problems it can be very useful.

Ultimately it comes to tradeoffs - the more specialized your requirements, especially with regards to thread-safe RAII, the more complex it is to implement the GC, and GC might not buy you very much for your application.

回答5:

Emm... my professor's reason is to make our (his students') life harder and to teach us "the real thing". Haha :)

In general, though, garbage collection is not always optimized for your specific app, so if you're one good programmer, you will definitely do a better job at memory management than GC ever will.

回答6:

Are there any other, less discussed, good reasons why someone would prefer manual memory management instead of GC?

Perhaps the most important unspoken issue is the code that the VM injects in order to make it work in harmony with the GC. In particular, all production-quality GCs silently incur a write barrier whenever a pointer is written into the heap.

For example, the following F# program creates an array of 10,000 ints and then exchanges them:

do
  let xs = Array.init 10000 int
  let timer = System.Diagnostics.Stopwatch.StartNew()
  for i=0 to xs.Length-2 do
    for j=i+1 to xs.Length-1 do
      let t = xs.[i]
      xs.[i] <- xs.[j]
      xs.[j] <- t
  printfn "%f" timer.Elapsed.TotalSeconds

Change that int to string and the program runs 2x slower because ints can be exchanged directly whereas exchanging reference types must incur two write barriers.

Another important situation that people love to brush under the rug is pathological behaviour of conventional GCs. Today, most GCs are generational which means they bump allocate into a nursery and then evacuate survivors to an older generation (typically mark-sweep). This works well when the generational hypothesis (that objects die young and old objects rarely refer to newer objects) holds because most of the objects in the nursery are dead when it is efficiently swept. But objects don't always die young and old objects are sometimes full of pointers to new objects.

In particular, the pathological behaviour of a generational GC is manifested when a program allocates a large array-based mutable data structure (e.g. hash table, hash set, stack, queue or heap) and then fills it with freshly-allocated objects. These new objects survive because they are referred to from an older object, completely violating the generational hypothesis. Consequently, solutions using GCs are typically 3x slower than necessary here.

FWIW, I believe mark-region GCs have the potential to evade these problems in the future. In a mark-region GC, the old generation is a collection of former nurseries. When a thread-local region fills and is found to contain mostly reachable objects the whole region can be logically migrated into the old generation without copying any objects and a new nursery can be allocated (or an non-full old nursery can be recycled).

回答7:

If you have a ton of objects which are freed rarely the garbage collector will start and waste time just to find out that there are only a few objects to finalize. In extreme cases this may cause a huge performance penalty.

回答8:

How about for security reasons? Eg if you've got an encryption private key in memory, you'd probably want it around for the shortest possible time.

Having said that, i think the way hardware is heading, learning the art of multithreaded programming may be more worth learning.

回答9:

One possible answer is, "Where security / system stability is not a primary requirement".

Bear in mind that applications given free reign over memory can cause all sorts of security concerns, including that of simply allocating and not freeing memory (DoS attack by slowing system to a standstill through insufficient available memory resource). This is a core part of Java's security model, for instance -- its GC ensures this can never happen.

My view, like that of Jon Harrop, is that GC adds overheads to system performance for several reasons (noted in other answers here); it is more indirect but more secure and takes responsibility for memory management away from the application developer; but there is always a performance cost for satefy nets.

回答10:

Garbage Collection vs Leaks

One of the primary reasons for me to avoid garbage collection is to avoid resource leaks in areas where leaking is critically bad. Garbage collection is great when safety and simplicity is your goal, but not to avoid leaky software.

A common scenario we've encountered with GC is that it's hard to avoid resource leaks with it.

Now this might confuse some people and seem paradoxical as to how garbage collection combined with less-than-ideal team practices can lead to leaky software, but the non-trivial management of resources in a software lie not with resources tied to a limited scope, but the persistent ones that linger around.

Complex Resource Management

An example of such complexity is a scene graph in a 3D software with millions of lines of code and thousands of objects and classes interacting with each other through events.

In these cases, these persistent objects often store handles/references to resources in the system, perhaps other objects living in the persistent scene graph. In such scenarios, you can have a central resource, R, like a complex 3D model that takes hundreds of megabytes of RAM, being accessed by many different parts of the scene and the user interface. For example, both a camera and light object might store a list of references to objects to exclude from the camera view and lighting system, of which such complex 3D models could be included.

In these cases, and in a team environment, it is not too uncommon for 3 separate developers to write code that stores persistent handles/references to R in dozens of different places in code. When the user removes R, all of these places should release the handle/reference.

Without garbage collection, if one of them fail to do so (perhaps he/she was having a bad day, is one of the less experienced developers, was in a high-pressure deadline crunch with looser testing and review standards, whatever the reason), a dangling pointer/handle/reference is left. Accessing it will crash the application with a segfault. Tracing such a segfault with a debugger will often immediately reveal where and why it happened.

With garbage collection, nothing obvious may happen except that running the software for longer periods of time continue to leak more and more resources. Because one of these places forgot to release the reference, permanently extending its lifetime, and continue using it in a valid, non-destroyed state, the software may not only continue to rise in memory usage but also get slower and slower the longer you run it processing hidden objects that are no longer visible to users.

To Crash or Not to Crash

In these types of cases, sometimes the obvious and glaring crash resulting from this mistake that can be immediately caught and handled during testing is actually preferable to a silent and very difficult-to-spot resource leak that could be a nightmare to trace down and may never be fixed.

So if you are working on such a project where an immediately obvious and correctable crash during testing might actually be preferable to a leaky software that often flies under the testing radar with such mistakes, garbage collection, unless it is combined with very careful coding standards and an awareness among every team member to watch out for its pitfalls (the need for weak or phantom references, e.g.), can actually do more harm than good. To me, garbage collection works best with much smaller, tighter teams and projects with actually a higher, not lower level of expertise in state/resource management, or ones where such resource leaks aren't anywhere near as bad as crashing.

From an in-the-trenches developer perspective, a glaring, in-your-face, showstopping bug is often preferable to the very subtle, hidden, 'no one knows what happened but something bad happened' kind of bugs. It often spells the difference between your debugger telling you what happened versus blindly flailing about trying to find needles in a haystack of millions of lines of code. Managing resources in large-scale systems is one of the most difficult things to get right, and garbage collection doesn't actually make this any easier. To crash or not to crash, that is the question, and in these scenarios we're often looking at the dangling handle crash without GC or the mysterious resource leak with it. In performance-critical applications dealing with potentially enormous resources, such leaks are often unacceptable.

回答11:

When you are building high-performance apps such as first person shooter games, you don't want GC that will potentially impact your app's execution. Manually managing the memories in those apps allow you to decide the right time to free up resources.