Why Java and Python garbage collection methods are

2019-01-30 00:53发布

Python uses the reference count method to handle object life time. So an object that has no more use will be immediately destroyed.

But, in Java, the GC(garbage collector) destroys objects which are no longer used at a specific time.

Why does Java choose this strategy and what is the benefit from this?

Is this better than the Python approach?

9条回答
forever°为你锁心
2楼-- · 2019-01-30 00:54

Garbage collection is faster (more time efficient) than reference counting, if you have enough memory. For example, a copying gc traverses the "live" objects and copies them to a new space, and can reclaim all the "dead" objects in one step by marking a whole memory region. This is very efficient, if you have enough memory. Generational collections use the knowledge that "most objects die young"; often only a few percent of objects have to be copied.

[This is also the reason why gc can be faster than malloc/free]

Reference counting is much more space efficient than garbage collection, since it reclaims memory the very moment it gets unreachable. This is nice when you want to attach finalizers to objects (e.g. to close a file once the File object gets unreachable). A reference counting system can work even when only a few percent of the memory is free. But the management cost of having to increment and decrement counters upon each pointer assignment cost a lot of time, and some kind of garbage collection is still needed to reclaim cycles.

So the trade-off is clear: if you have to work in a memory-constrained environment, or if you need precise finalizers, use reference counting. If you have enough memory and need the speed, use garbage collection.

查看更多
看我几分像从前
3楼-- · 2019-01-30 00:57

There are drawbacks of using reference counting. One of the most mentioned is circular references: Suppose A references B, B references C and C references B. If A were to drop its reference to B, both B and C will still have a reference count of 1 and won't be deleted with traditional reference counting. CPython (reference counting is not part of python itself, but part of the C implementation thereof) catches circular references with a separate garbage collection routine that it runs periodically...

Another drawback: Reference counting can make execution slower. Each time an object is referenced and dereferenced, the interpreter/VM must check to see if the count has gone down to 0 (and then deallocate if it did). Garbage Collection does not need to do this.

Also, Garbage Collection can be done in a separate thread (though it can be a bit tricky). On machines with lots of RAM and for processes that use memory only slowly, you might not want to be doing GC at all! Reference counting would be a bit of a drawback there in terms of performance...

查看更多
时光不老,我们不散
4楼-- · 2019-01-30 00:57

Darren Thomas gives a good answer. However, one big difference between the Java and Python approaches is that with reference counting in the common case (no circular references) objects are cleaned up immediately rather than at some indeterminate later date.

For example, I can write sloppy, non-portable code in CPython such as

def parse_some_attrs(fname):
    return open(fname).read().split("~~~")[2:4]

and the file descriptor for that file I opened will be cleaned up immediately because as soon as the reference to the open file goes away, the file is garbage collected and the file descriptor is freed. Of course, if I run Jython or IronPython or possibly PyPy, then the garbage collector won't necessarily run until much later; possibly I'll run out of file descriptors first and my program will crash.

So you SHOULD be writing code that looks like

def parse_some_attrs(fname):
    with open(fname) as f:
        return f.read().split("~~~")[2:4]

but sometimes people like to rely on reference counting to always free up their resources because it can sometimes make your code a little shorter.

I'd say that the best garbage collector is the one with the best performance, which currently seems to be the Java-style generational garbage collectors that can run in a separate thread and has all these crazy optimizations, etc. The differences to how you write your code should be negligible and ideally non-existent.

查看更多
Explosion°爆炸
5楼-- · 2019-01-30 00:58

Actually reference counting and the strategies used by the Sun JVM are all different types of garbage collection algorithms.

There are two broad approaches for tracking down dead objects: tracing and reference counting. In tracing the GC starts from the "roots" - things like stack references, and traces all reachable (live) objects. Anything that can't be reached is considered dead. In reference counting each time a reference is modified the object's involved have their count updated. Any object whose reference count gets set to zero is considered dead.

With basically all GC implementations there are trade offs but tracing is usually good for high through put (i.e. fast) operation but has longer pause times (larger gaps where the UI or program may freeze up). Reference counting can operate in smaller chunks but will be slower overall. It may mean less freezes but poorer performance overall.

Additionally a reference counting GC requires a cycle detector to clean up any objects in a cycle that won't be caught by their reference count alone. Perl 5 didn't have a cycle detector in its GC implementation and could leak memory that was cyclic.

Research has also been done to get the best of both worlds (low pause times, high throughput): http://cs.anu.edu.au/~Steve.Blackburn/pubs/papers/urc-oopsla-2003.pdf

查看更多
太酷不给撩
6楼-- · 2019-01-30 01:02

Reference counting is particularly difficult to do efficiently in a multi-threaded environment. I don't know how you'd even start to do it without getting into hardware assisted transactions or similar (currently) unusual atomic instructions.

Reference counting is easy to implement. JVMs have had a lot of money sunk into competing implementations, so it shouldn't be surprising that they implement very good solutions to very difficult problems. However, it's becoming increasingly easy to target your favourite language at the JVM.

查看更多
Emotional °昔
7楼-- · 2019-01-30 01:09

The latest Sun Java VM actually have multiple GC algorithms which you can tweak. The Java VM specifications intentionally omitted specifying actual GC behaviour to allow different (and multiple) GC algorithms for different VMs.

For example, for all the people who dislike the "stop-the-world" approach of the default Sun Java VM GC behaviour, there are VM such as IBM's WebSphere Real Time which allows real-time application to run on Java.

Since the Java VM spec is publicly available, there is (theoretically) nothing stopping anyone from implementing a Java VM that uses CPython's GC algorithm.

查看更多
登录 后发表回答