Why no Reference Counting + Garbage Collection in

2019-01-12 21:58发布

I come from a C++ background and I've been working with C# for about a year. Like many others I'm flummoxed as to why deterministic resource management is not built-in to the language. Instead of deterministic destructors we have the dispose pattern. People start to wonder whether spreading the IDisposable cancer through their code is worth the effort.

In my C++-biased brain it seems like using reference-counted smart pointers with deterministic destructors is a major step up from a garbage collector that requires you to implement IDisposable and call dispose to clean up your non-memory resources. Admittedly, I'm not very smart... so I'm asking this purely from a desire to better understand why things are the way they are.

What if C# were modified such that:

Objects are reference counted. When an object's reference count goes to zero, a resource cleanup method is called deterministically on the object, then the object is marked for garbage collection. Garbage collection occurs at some non-deterministic time in the future at which point memory is reclaimed. In this scenario you don't have to implement IDisposable or remember to call Dispose. You just implement the resource cleanup function if you have non-memory resources to release.

  • Why is that a bad idea?
  • Would that defeat the purpose of the garbage collector?
  • Would it be feasible to implement such a thing?

EDIT: From the comments so far, this is a bad idea because

  1. GC is faster without reference counting
  2. problem of dealing with cycles in the object graph

I think number one is valid, but number two is easy to deal with using weak references.

So does the speed optimization outweigh the cons that you:

  1. may not free a non-memory resource in a timely manner
  2. might free a non-memory resource too soon

If your resource cleanup mechanism is deterministic and built-in to the language you can eliminate those possibilities.

10条回答
手持菜刀,她持情操
2楼-- · 2019-01-12 22:10

I know something about garbage collection. Here is a short summary because a full explanation is beyond the bounds of this question.

.NET uses a copying and compacting generational garbage collector. This is more advanced than reference counting and has the benefit of being able to collect objects that refer to themselves either directly, or through a chain.

Reference counting will not collect cycles. Reference counting also has a lower throughput (slower overall) but with the benefit of faster pauses (maximal pauses are smaller) than a tracing collector.

查看更多
Rolldiameter
3楼-- · 2019-01-12 22:11

There is a difference between C++ style smart pointer reference counting and reference counting garbage collection. I've also talked about the differences on my blog but here is a quick summary:

C++ Style Reference Counting:

  • Unbounded cost on decrement: if the root of a large data structure is decremented to zero there is an unbounded cost to free all the data.

  • Manual cycle collection: to prevent cyclic data structures from leaking memory the programmer must manually break any potential structures by replacing part of the cycle with a weak smart pointer. This is another source of potential defects.

Reference Counting Garbage Collection

  • Deferred RC: Changes to an object reference count are ignored for stack and register references. Instead when a GC is triggered these objects are retained by collecting a root set. Changes to the reference count can be deferred and processed in batches. This results in higher throughput.

  • Coalescing: using a write barrier it is possible to coalesce changes to the reference count. This makes it possible to ignore most changes to an objects reference count improving RC performance for frequently mutated references.

  • Cycle Detection: for a complete GC implementation a cycle detector must also be used. However it is possible to perform cycle detection in incremental fashion, which in turn means bounded GC time.

Basically it is possible to implement a high performance RC based garbage collector for runtimes such as Java's JVM and the .net CLR runtime.

I think tracing collectors are partly in use for historical reasons: many of the recent improvements in reference counting came after both the JVM and .net runtime were released. Research work also takes time to transition into production projects.

Deterministic Resource Disposal

This is pretty much a separate issue. The .net runtime makes this possible using the IDisposable interface, example below. I also like Gishu's answer.


@Skrymsli, this is the purpose of the "using" keyword. E.g.:

public abstract class BaseCriticalResource : IDiposable {
    ~ BaseCriticalResource () {
        Dispose(false);
    }

    public void Dispose() {
        Dispose(true);
        GC.SuppressFinalize(this); // No need to call finalizer now
    }

    protected virtual void Dispose(bool disposing) { }
}

Then to add a class with a critical resource:

public class ComFileCritical : BaseCriticalResource {

    private IntPtr nativeResource;

    protected override Dispose(bool disposing) {
        // free native resources if there are any.
        if (nativeResource != IntPtr.Zero) {
            ComCallToFreeUnmangedPointer(nativeResource);
            nativeResource = IntPtr.Zero;
        }
    }
}

Then to use it is as simple as:

using (ComFileCritical fileResource = new ComFileCritical()) {
    // Some actions on fileResource
}

// fileResource's critical resources freed at this point

See also implementing IDisposable correctly.

查看更多
闹够了就滚
4楼-- · 2019-01-12 22:13

The garbage collector does not require you to write a Dispose method for every class/type that you define. You only define one when you need to explicitly do something to cleanup ; when you have explicitly allocated native resources. Most of the time, the GC just reclaims memory even if you only do something like new() up an object.

The GC does reference counting - however it does it in a different way by finding which objects are 'reachable' (Ref Count > 0) every time it does a collection... it just doesn't do it in a integer counter way. . Unreachable objects are collected (Ref Count = 0). This way the runtime doesn't have to do housekeeping/updating tables everytime an object is assigned or released... should be faster.

The only major difference between C++ (deterministic) and C# (non-deterministic) is when the object would be cleaned up. You can't predict the exact moment an object would be collected in C#.

Umpteenth plug: I'd recommend reading Jeffrey Richter's standup chapter on the GC in CLR via C# in case you're really interested in how the GC works.

查看更多
SAY GOODBYE
5楼-- · 2019-01-12 22:19

I come from a C++ background and I've been working with C# for about a year. Like many others I'm flummoxed as to why deterministic resource management is not built-in to the language.

The using construct provides "deterministic" resource management and is built into the C# language. Note that by "deterministic" I mean Dispose is guaranteed to have been called before the code after the using block starts executing. Note also that this is not what the word "deterministic" means but everyone seems to abuse it in this context in that way, which sucks.

In my C++-biased brain it seems like using reference-counted smart pointers with deterministic destructors is a major step up from a garbage collector that requires you to implement IDisposable and call dispose to clean up your non-memory resources.

The garbage collector does not require you to implement IDisposable. Indeed, the GC is completely oblivious to it.

Admittedly, I'm not very smart... so I'm asking this purely from a desire to better understand why things are the way they are.

Tracing garbage collection is a fast and reliable way to emulate an infinite memory machine, freeing the programmer from the burden of manual memory management. This eliminated several classes of bugs (dangling pointers, free too soon, double free, forgot to free).

What if C# were modified such that:

Objects are reference counted. When an object's reference count goes to zero, a resource cleanup method is called deterministically on the object,

Consider an object shared between two threads. The threads race to decrement the reference count to zero. One thread will win the race and the other will be responsible for cleanup. That is non-deterministic. The belief that reference counting is inherently deterministic is a myth.

Another common myth is that reference counting frees objects at the earliest possible point in the program. It doesn't. Decrements are always deferred, usually to the end of scope. This keeps objects alive for longer than necessary leaving what is called "floating garbage" lying around. Note that, in particular, some tracing garbage collectors can and do recycle objects earlier than scope-based reference counting implementations.

then the object is marked for garbage collection. Garbage collection occurs at some non-deterministic time in the future at which point memory is reclaimed. In this scenario you don't have to implement IDisposable or remember to call Dispose.

You don't have to implement IDisposable for garbage collected objects anyway, so that is a non-benefit.

You just implement the resource cleanup function if you have non-memory resources to release.

Why is that a bad idea?

Naive reference counting is very slow and leaks cycles. For example, Boost's shared_ptr in C++ is up to 10x slower than OCaml's tracing GC. Even naive scope-based reference counting is non-deterministic in the presence of multithreaded programs (which is almost all modern programs).

Would that defeat the purpose of the garbage collector?

Not at all, no. In fact it is a bad idea that was invented in the 1960s and subjected to intense academic study for the next 54 years concluding that reference counting sucks in the general case.

Would it be feasible to implement such a thing?

Absolutely. Early prototype .NET and JVM used reference counting. They also found it sucked and dropped it in favor of tracing GC.

EDIT: From the comments so far, this is a bad idea because

GC is faster without reference counting

Yes. Note that you can make reference counting much faster by deferring counter increments and decrements but that sacrifices the determinism that you crave so very much and it is still slower than tracing GC with today's heap sizes. However, reference counting is asymptotically faster so at some point in the future when heaps get really big maybe we will start using RC in production automated memory management solutions.

problem of dealing with cycles in the object graph

Trial deletion is an algorithm specifically designed to detect and collect cycles in reference counted systems. However, it is slow and non-deterministic.

I think number one is valid, but number two is easy to deal with using weak references.

Calling weak references "easy" is a triumph of hope over reality. They are a nightmare. Not only are they unpredictable and difficult to architect but they pollute APIs.

So does the speed optimization outweigh the cons that you:

may not free a non-memory resource in a timely manner

Doesn't using free non-memory resource in a timely manner?

might free a non-memory resource too soon If your resource cleanup mechanism is deterministic and built-in to the language you can eliminate those possibilities.

The using construct is deterministic and built into the language.

I think the question you really want to ask is why doesn't IDisposable use reference counting. My response is anecdotal: I've been using garbage collected languages for 18 years and I have never needed to resort to reference counting. Consequently, I much prefer simpler APIs that aren't polluted with incidental complexity like weak references.

查看更多
男人必须洒脱
6楼-- · 2019-01-12 22:19

The object implemeting IDisposable must also implement a finalizer called by the GC when the user doesn't explicit call Dispose - see IDisposable.Dispose at MSDN.

The whole point of IDisposable is that the GC is running at some non-deterministic time and you implement IDisposable because you hold a valuable resource and wants to free it at a deterministic time.

So your proposal would change nothing in terms of IDisposable.

Edit:

Sorry. Didn't read your proposal correctly. :-(

Wikipedia has a simple explanation of the shortcomings of References counted GC

查看更多
虎瘦雄心在
7楼-- · 2019-01-12 22:26

Reference count

The costs of using reference counts are twofold: First, every object requires the special reference count field. Typically, this means an extra word of storage must be allocated in each object. Second, every time one reference is assigned to another, the reference counts must be adjusted. This increases significantly the time taken by assignment statements.

Garbage Collection in .NET

C# does not use reference counting of the objects. Instead it maintains a graph of the object references from the stack and navigates from the root to cover up all the referenced objects. All the referenced objects in the graph are compacted in the heap to that a contiguous memory is available for future objects. Memory for all the unreferenced objects who do not need to be finalized is reclaimed. Those that are unreferenced but have finalizers to be executed on them are moved to a separate queue called the f-reachable queue where the garbage collector calls their finalizers in the background.

In addition to the above GC uses the concept of generations for a more efficient garbage collection. It is based on the following concepts 1. It is faster to compact the memory for a portion of the managed heap than for the entire managed heap 2. Newer objects will have shorter lifetimes and older objects will have longer lifetimes 3. Newer objects tend to be related to each other and accessed by the application around the same time

The managed heap is divided into three generations: 0, 1, and 2. The new objects are stored in gen 0. Objects that are not reclaimed by a cycle of GC are promoted to the next gen. So if newer objects which are in gen 0 survive GC cycle 1, then they are promoted to gen 1. Those among these that survive GC cycle 2 are promoted to gen 2. Because the garbage collector supports only three generations, objects in generation 2 that survive a collection remain in generation 2 until they are determined to be unreachable in a future collection.

The garbage collector performs a collection when generation 0 is full and memory for new object needs to be allocated. If a collection of generation 0 does not reclaim enough memory, the garbage collector can perform a collection of generation 1, then generation 0. If this does not reclaim enough memory, the garbage collector can perform a collection of generations 2, 1, and 0.

Thus GC is more efficient than reference count.

查看更多
登录 后发表回答