Garbage collector vs. collections

2019-05-07 03:04发布

问题:

I have read few posts about garbage collection in Java, but still I cannot decide whether clearing a collection explicitly is considered a good practice or not... and since I could not find a clear answer, I decided to ask it here.

Consider this example:

List<String> list = new LinkedList<>();
// here we use the list, perhaps adding hundreds of items in it...
// ...and now the work is done, the list is not needed anymore
list.clear();
list = null;

From what I saw in implementations of e.g. LinkedList or HashSet, the clear() method basically just loops all the items in the given collection, setting all its elements (in case of LinkedList also references to next and previous elements) to null

If I got it right, setting the list to null just removes one reference from list - considering it was the only reference to it, the garbage collector will eventually take care of it. I just don't know how long would it take until also the list's elements are processed by garbage collector in this case.

So my question is - do the last two lines of the above listed example code actually help the garbage collector to work more efficiently (i.e. to collect the list's elements earlier) or would I just make my application busy with "irrelevant tasks"?

回答1:

The last two lines do not help.

  • Once the list variable goes out of scope*, if that's the last reference to the linked list then the list becomes eligible for garbage collection. Setting list to null immediately beforehand adds no value.

  • Once the list becomes eligible for garbage collection, so to do its elements if the list holds the only references to them. Clearing the list is unnecessary.

For the most part you can trust the garbage collector to do its job and do not need to "help" it.

* Pedantically speaking, it's not scope that controls garbage collection, but reachability. Reachability isn't easy to sum up in one sentence. See this Q&A for an explanation of this distinction.


One common exception to this rule is if you have code that will retain references longer than they're needed. The canonical example of this is with listeners. If you add a listener to some component, and later on that listener is no longer needed, you need to explicitly remove it. If you don't, that listener can inhibit garbage collection of both itself and of the objects it has references to.

Let's say I added a listener to a button like so:

button.addListener(event -> label.setText("clicked!"));

Then later on the label is removed, but the button remains.

window.removeChild(label);

This is a problem because the button has a reference to the listener and the listener has a reference to the label. The label can't be garbage collected even though it's no longer visible on screen.

This is a time to take action and get on the GC's good side. I need to remember the listener when I add it...

Listener listener = event -> label.setText("clicked!");
button.addListener(listener);

...so that I can remove it when I'm done with the label:

window.removeChild(label);
button.removeListener(listener);


回答2:

It depends on the following factors

  • how clear() is implemented
  • the allocation patterns for the entries held by the collection
  • the garbage collector
  • whether there might be other things holding onto the collection or subviews of it (does not apply to your example but common in the real world)

For a primitive, non-generational, tracing garbage-collector clearing out references only means extra work for without making things much easier on the GC. But clearing may still help if you cannot guarantee that all references to the collection are nulled out in a timely manner.

For generational GCs and especially G1GC nulling out references inside a collection (or a reference array) may be helpful under some circumstances by reducing cross-region references.

But that only helps if you actually have allocation patterns that create objects in different regions and put them into a collection living in a another region. And it also depends on the clear() implementation nulling out those references, which turns clearing into an O(n) operation when it could often be implemented as a O(1) one.

So for your concrete example the answer would be as follows:

If

  • your list is long-lived
  • the lists created on that code-path make up/hold onto a significant fraction of the garbage your application produces
  • you're using G1 or a similar multi-generational collector
  • slowly accumulates objects before eventually being released (this usually puts them in different regions, thus creating cross-region references)
  • you wish to trade CPU-time on clearing for reduced GC workload
  • the clear() implementation is O(n) instead of O(1), i.e. nulls out all entries. OpenJDK's 1.8 LinkedList does this.

then it may be beneficial to call clear() before releasing the collection itself.

So at best this is a very workload-specific micro-optimization that should only be applied after profiling/monitoring the application under realistic conditions and determining that GC overhead justifies the extra cost of clearing.


For reference, OpenJDK 1.8's LinkedList::clear

/**
 * Removes all of the elements from this list.
 * The list will be empty after this call returns.
 */
public void clear() {
    // Clearing all of the links between nodes is "unnecessary", but:
    // - helps a generational GC if the discarded nodes inhabit
    //   more than one generation
    // - is sure to free memory even if there is a reachable Iterator
    for (Node<E> x = first; x != null; ) {
        Node<E> next = x.next;
        x.item = null;
        x.next = null;
        x.prev = null;
        x = next;
    }
    first = last = null;
    size = 0;
    modCount++;
}


回答3:

I don't believe the clear() will help in this instance. The GC will remove items once there are no more references to them, so in theory, just setting the list = null will have the same effect. You cannot control when the GC will be called, so in my view its not worth worry about unless you have specific resource/performance requirements. Personally I'd still with list = null;

If you want to reuse the list variable, then of course clear() is the best option rather than creating a new list object.



回答4:

In Java an object is either alive (reachable via a reference owned by some other object) or dead (not reachable by a reference owner by any other object). Objects that are only reachable from dead objects are also considered dead and eligible for garbage collection.

If no live object has a reference to your collection, then it is unreachable and eligible for garbage collection. What this also means is that all of your collection's elements (and any other helper objects that it may have created) are also unreachable unless some other live object has a reference to them.

Therefore, the clear method has no effect other than erasing a reference from one dead object to another. They will get garbage collected either way.