Java GC does not gather a “zombie” object for a se

2019-08-22 10:11发布

问题:

I am trying to create a mechanism to cache objects into memory, for future use, even if these objects are out of context. There would be a parallel deterministic process which will dictate (by a unique ID) whether the cached object should be retrieved again or if it should completely die. Here is the simplest example, with debug information to make things easier:

package com.panayotis.resurrect;

import java.util.Map;
import java.util.HashMap;

public class ZObject {

    private static int IDGEN = 1;

    protected int id;
    private boolean isKilled = false;

    public static final Map<Integer, ZObject> zombies = new HashMap<>();

    public static void main(String[] args) {
        for (int i = 0; i < 5; i++)
            System.out.println("* INIT: " + new ZObject().toString());
        gc();
        sleep(1000);

        if (!zombies.isEmpty())
            ZObject.revive(2);

        gc();
        sleep(1000);

        if (!zombies.isEmpty())
            ZObject.kill(1);

        gc();
        sleep(1000);
        gc();
        sleep(1000);
        gc();
        sleep(1000);
        gc();
        sleep(1000);
    }

    public ZObject() {
        this.id = IDGEN++;
    }

    protected final void finalize() throws Throwable {
        String debug = "" + zombies.size();
        String name = toString();
        String style;
        if (!isKilled) {
            style = "* Zombie";
            zombies.put(id, this);
        } else {
            style = "*** FINAL ***";
            zombies.remove(id);
            super.finalize();
        }
        dumpZombies(style + " " + debug, name);
    }

    public String toString() {
        return (isKilled ? "killed" : zombies.containsKey(id) ? "zombie" : "alive ") + " " + id;
    }

    public static ZObject revive(int peer) {
        ZObject obj = zombies.remove(peer);
        if (obj != null) {
            System.out.println("* Revive      " + obj.toString());
            obj.isKilled = false;
        } else
            System.out.println("* Not found as zombie " + peer);
        return obj;
    }

    public static void kill(int peer) {
        int size = zombies.size();
        ZObject obj = zombies.get(peer);
        String name = obj == null ? peer + " TERMINATED " : obj.toString();
        zombies.remove(peer);
        dumpZombies("*   Kill " + size, name);
        if (obj != null)
            obj.isKilled = true;
    }

    private static void dumpZombies(String baseMsg, String name) {
        System.out.println(baseMsg + "->" + zombies.size() + " " + name);
        for (Integer key : zombies.keySet())
            System.out.println("*             " + zombies.get(key).toString());
    }

    public static void gc() {
        System.out.println("* Trigger GC");
        for (int i = 0; i < 50; i++)
            System.gc();
    }

    public static void sleep(int howlong) {
        try {
            Thread.sleep(howlong);
        } catch (InterruptedException ex) {
        }
    }
}

This code will create 5 objects, resurrect the first one and then kill the first one. I was expecting

  • After first resurrection, and since the object doesn't have any more references yet, to re-enter zombie state through finalize (which it doesn't)

  • After killing an object again to completely be removed from memory through again the finalize method

It seems, in other words, that finalize is called only once. I have checked that this is not a byproduct of the HashMap object with this code:

package com.panayotis.resurrect;

import java.util.HashMap;

public class TestMap {

    private static final HashMap<Integer, TestMap> map = new HashMap<>();

    private static int IDGEN = 1;
    private final int id;

    public static void main(String[] args) {
        map.put(1, new TestMap(1));
        map.put(2, new TestMap(2));
        map.put(3, new TestMap(3));
        map.remove(1);
        System.out.println("Size: " + map.size());
        for (int i = 0; i < 50; i++)
            System.gc();
    }

    public TestMap(int id) {
        this.id = id;
    }

    protected void finalize() throws Throwable {
        System.out.println("Finalize " + id);
        super.finalize();
    }
}

So, why this behavior? I am using Java 1.8

EDIT Since this is not directly possible, any ideas how I can accomplish this?

回答1:

This is exactly the specified behavior:

Object.finalize()

After the finalize method has been invoked for an object, no further action is taken until the Java virtual machine has again determined that there is no longer any means by which this object can be accessed by any thread that has not yet died, including possible actions by other objects or classes which are ready to be finalized, at which point the object may be discarded.

The finalize method is never invoked more than once by a Java virtual machine for any given object.

You seem to have a wrong understanding of what the finalize() method does. This method does not free the object’s memory, declaring a custom non-trivial finalize() method is actually preventing the object’s memory from being freed as it has to be kept in memory for the execution of that method and afterwards, until the garbage collector has determined has it has become unreachable again. Not calling finalize() again does not imply that the object doesn’t get freed, it implies that it will be freed without calling finalize() again.

Instances of classes without a custom finalize() method or having a “trivial” finalize method (being empty or solely consisting of a super.finalize() call to another trivial finalizer) are not going through the finalization queue at all and are both, allocated faster and reclaimed faster.

That’s why you should never try to implement an object cache just for the memory, the result will always be less efficient than the JVM’s own memory management. But if you are managing an actually expensive resource, you may handle it by separating it into two different kinds of objects, a front-end providing the API to the application, which may get garbage collected whenever the application doesn’t use it, and a back-end object describing the actual resource, which is not directly seen by the application and may get reused.

It is implied that the resource is expensive enough to justify the weight of this separation. Otherwise, it’s not really a resource worth caching.

// front-end class
public class Resource {
    final ActualResource actual;

    Resource(ActualResource actual) {
        this.actual = actual;
    }
    public int getId() {
        return actual.getId();
    }
    public String toString() {
        return actual.toString();
    }
}
class ActualResource {
    int id;

    ActualResource(int id) {
        this.id = id;
    }

    int getId() {
        return id;
    }

    @Override
    public String toString() {
        return "ActualResource[id="+id+']';
    }
}
public class ResourceManager {
    static final ReferenceQueue<Resource> QUEUE = new ReferenceQueue<>();
    static final List<ActualResource> FREE = new ArrayList<>();
    static final Map<WeakReference<?>,ActualResource> USED = new HashMap<>();
    static int NEXT_ID;

    public static synchronized Resource getResource() {
        for(;;) {
            Reference<?> t = QUEUE.poll();
            if(t==null) break;
            ActualResource r = USED.remove(t);
            if(r!=null) FREE.add(r);
        }
        ActualResource r;
        if(FREE.isEmpty()) {
            System.out.println("allocating new resource");
            r = new ActualResource(NEXT_ID++);
        }
        else {
            System.out.println("reusing resource");
            r = FREE.remove(FREE.size()-1);
        }
        Resource frontEnd = new Resource(r);
        USED.put(new WeakReference<>(frontEnd, QUEUE), r);
        return frontEnd;
    }
    /**
     * Allow the underlying actual resource to get garbage collected with r.
     */
    public static synchronized void stopReusing(Resource r) {
        USED.values().remove(r.actual);
    }
    public static synchronized void clearCache() {
        FREE.clear();
        USED.clear();
    }
}

Note that the manager class may have arbitrary methods for controlling the caching or manual release of resources, the methods above are just examples. If your API supports the front-end to become invalid, e.g. after calling close(), dispose() or such alike, immediate explicit freeing or reuse can be provided without having to wait for the next gc cycle. While finalize() is called exactly one time, you can control the number of reuse cycles here, including the option of enqueuing zero times.

Here is some test code

static final ResourceManager manager = new ResourceManager();
public static void main(String[] args) {
    Resource r1 = manager.getResource();
    Resource r2 = manager.getResource();
    System.out.println("r1 = "+r1+", r2 = "+r2);
    r1 = null;
    forceGC();

    r1 = manager.getResource();
    System.out.println("r1 = "+r1);
    r1 = null;
    forceGC();

    r1 = manager.getResource();
    System.out.println("r1 = "+r1);

    manager.stopReusing(r1);

    r1 = null;
    forceGC();

    r1 = manager.getResource();
    System.out.println("r1 = "+r1);
}
private static void forceGC() {
    for(int i = 0; i<5; i++ ) try {
        System.gc();
        Thread.sleep(50);
    } catch(InterruptedException ex){}
}

Which will likely (System.gc() still isn’t guaranteed to have an effect) print:

allocating new resource
allocating new resource
r1 = ActualResource[id=0], r2 = ActualResource[id=1]
reusing resource
r1 = ActualResource[id=0]
reusing resource
r1 = ActualResource[id=0]
allocating new resource
r1 = ActualResource[id=2]


回答2:

You should not implement the finalize method, as the GC will call it only once for each instance.

So if the GC will find an object to delete, it will call the finalize. Then it will check again for maybe new references. It might find one and keep the object in the memory.

On the next run, the same object will, again, not have any references. The GC will just kill it, it will not call the finalize again.



回答3:

You know what?

I think that your stated requirements would simply be satisfied by a concurrent map.

I am trying to create a mechanism to cache objects into memory, for future use, even if these objects are out of context.

That is simply a map, with the ID as the key; e.g.

Map<IdType, ValueType> cache = new HashMap<>();

When you create an object that needs to be cached, you simply call cache.put(id, object). It will remain cached until you remove it.

There would be a parallel deterministic process which will dictate (by a unique ID) whether the cached object should be retrieved again or if it should completely die.

That's a thread ("parallel deterministic process") that calls cache.remove(id).

Now, if you remove an object from the cache and it is still in use somewhere else (i.e. it is still reachable) then it won't be garbage collected. But that is OK. But is shouldn't be!


But what about that stuff with finalize()?

As far as I can see, it does not contribute to your stated requirement at all. Your code seems to detecting objects that are destined to be deleted, and making them reachable (your zombies list). That seems to be the opposite of your requirements.

  • If the purpose of the finalize() is simply to track when the Zombie objects are actually deleted, then finalize() is only ever called once, so it can't do that. But, why is the finalize() method adding the object to the zombie list?

  • If your requirements are actually misstated and you are really trying to create "immortal" objects (i.e. objects that cannot be deleted), then a plain Map will do that. Just don't remove the object's key, and it will "live" for ever.


Now implementing a cache as a plain map risks creating a memory leak. There are a couple of ways to address that:

  • You can create a subclass of LinkedHashMap, and implement removeEldestEntry() to tell the map when to remove the oldest entry if the cache has too many entries; see the javadocs for details.

  • You can implement a cache as a HashMap<SoftReference<IdType>, ValueType> and use a ReferenceQueue to remove cache entries whose references have been broken by the GC. (Note that soft references will be broken by the GC when a key is no longer strongly reachable, and memory is running short.)