GroovyShell in Java8 : memory leak / duplicated cl

2019-03-16 17:32发布

问题:

We have a memory leak caused by GroovyShell/ Groovy scripts (see GroovyEvaluator code at the end). Main problems are (copy-paste from MAT analyser):

The class "java.beans.ThreadGroupContext", loaded by "<system class loader>", occupies 807,406,960 (33.38%) bytes.

and:

16 instances of "org.codehaus.groovy.reflection.ClassInfo$ClassInfoSet$Segment", loaded by "sun.misc.Launcher$AppClassLoader @ 0x7004e9c80" occupy 1,510,256,544 (62.44%) bytes

We're using Groovy 2.3.11 and Java8 (1.8.0_25 to be exact).
Upgrading to Groovy 2.4.6 doesn't solve the problem. Just improves memory usage a little bit, esp. non-heap.
Java args we're using: -XX:+CMSClassUnloadingEnabled -XX:+UseConcMarkSweepGC

BTW, I've read https://dzone.com/articles/groovyshell-and-memory-leaks. We do set GroovyShell shell to null when it's no longer needed. Using GroovyShell().parse() would probably help but it isn't really an option for us - we have >10 sets, each consisting of 20-100 scripts, and they can be changed at any time (on runtime).

Setting MaxMetaspaceSize should also help, but it doesn't really solve the root problem, doesn't remove the root cause. So I'm still trying to nail it down.


I created load test to recreate the problem (see the code at the end). When I run it:

  • heap size, metaspace size and number of classes keep increasing
  • heap dump taken after several minutes is bigger than 4GB

Performance charts for first 3 minutes:

As I've already mentioned I'm using MAT to analyse heap dumps. So let's check Dominator tree report:

Hashmap takes > 30% of the heap. So let's analyse it further. Let's see what sits inside it. Let's check hash entries:

It reports 38 830 entiries. Including 38 780 entries with keys matching ".class Script."

Another thing, "duplicate classes" report:

We have 400 entries (because load tests defines 400 G.scripts), all for "ScriptN" classes. All of them holding references to groovyclassloader$innerloader

I've found similar bug reported: https://issues.apache.org/jira/browse/GROOVY-7498 (see comments at the end and attached screenshot) - their problems were solved by upgrading Java to 1.8u51. It didn't do a trick for us though.

Our code:

public class GroovyEvaluator
{
    private GroovyShell shell;

    public GroovyEvaluator()
    {
        this(Collections.<String, Object>emptyMap());
    }

    public GroovyEvaluator(final Map<String, Object> contextVariables)
    {
        shell = new GroovyShell();
        for (Map.Entry<String, Object> contextVariable : contextVariables.entrySet())
        {
            shell.setVariable(contextVariable.getKey(), contextVariable.getValue());
        }
    }

    public void setVariables(final Map<String, Object> answers)
    {
        for (Map.Entry<String, Object> questionAndAnswer : answers.entrySet())
        {
            String questionId = questionAndAnswer.getKey();
            Object answer = questionAndAnswer.getValue();
            shell.setVariable(questionId, answer);
        }
    }

    public Object evaluateExpression(String expression)
    {
        return shell.evaluate(expression);
    }

    public void setVariable(final String name, final Object value)
    {
        shell.setVariable(name, value);
    }

    public void close()
    {
        shell = null;
    }
}

Load test:

/** Run using -XX:+CMSClassUnloadingEnabled -XX:+UseConcMarkSweepGC */
public class GroovyEvaluatorLoadTest
{
    private static int NUMBER_OF_QUESTIONS = 400;
    private final Map<String, Object> contextVariables = Collections.emptyMap();
    private List<Fact> factMappings = new ArrayList<>();

    public GroovyEvaluatorLoadTest()
    {
        for (int i=0; i<NUMBER_OF_QUESTIONS; i++)
        {
            factMappings.add(new Fact("fact" + i, "question" + i));
        }
    }

    private void callEvaluateExpression(int iter)
    {
        GroovyEvaluator groovyEvaluator = new GroovyEvaluator(contextVariables);

        Map<String, Object> factValues = new HashMap<>();
        Map<String, Object> answers = new HashMap<>();
        for (int i=0; i<NUMBER_OF_QUESTIONS; i++)
        {
            factValues.put("fact" + i, iter + "-fact-value-" + i);
            answers.put("question" + i, iter + "-answer-" + i);
        }

        groovyEvaluator.setVariables(answers);
        groovyEvaluator.setVariable("answers", answers);
        groovyEvaluator.setVariable("facts", factValues);

        for (Fact fact : factMappings)
        {
            groovyEvaluator.evaluateExpression(fact.mapping);
        }
        groovyEvaluator.close();
    }

    public static void main(String [] args)
    {
        GroovyEvaluatorLoadTest test = new GroovyEvaluatorLoadTest();

        for (int i=0; i<995000; i++)
        {
            test.callEvaluateExpression(i);
        }
        test.callEvaluateExpression(0);
    }
}

public class Fact
{
    public final String factId;

    public final String mapping;

    public Fact(final String factId, final String mapping)
    {
        this.factId = factId;
        this.mapping = mapping;
    }
}

Any thoughts? Thx in advance

回答1:

OK, this is my solution:

public class GroovyEvaluator
{
    private static GroovyScriptCachingBuilder groovyScriptCachingBuilder = new GroovyScriptCachingBuilder();
    private Map<String, Object> variables = new HashMap<>();

    public GroovyEvaluator()
    {
        this(Collections.<String, Object>emptyMap());
    }

    public GroovyEvaluator(final Map<String, Object> contextVariables)
    {
        variables.putAll(contextVariables);
    }

    public void setVariables(final Map<String, Object> answers)
    {
        variables.putAll(answers);
    }

    public void setVariable(final String name, final Object value)
    {
        variables.put(name, value);
    }

    public Object evaluateExpression(String expression)
    {
        final Binding binding = new Binding();
        for (Map.Entry<String, Object> varEntry : variables.entrySet())
        {
            binding.setProperty(varEntry.getKey(), varEntry.getValue());
        }
        Script script = groovyScriptCachingBuilder.getScript(expression);
        synchronized (script)
        {
            script.setBinding(binding);
            return script.run();
        }
    }

}

public class GroovyScriptCachingBuilder
{
    private GroovyShell shell = new GroovyShell();
    private Map<String, Script> scripts = new HashMap<>();

    public Script getScript(final String expression)
    {
        Script script;
        if (scripts.containsKey(expression))
        {
            script = scripts.get(expression);
        }
        else
        {
            script = shell.parse(expression);
            scripts.put(expression, script);
        }
        return script;
    }
}

New solution keeps number of loaded classes and Metadata size at a constant level. Non-heap allocated memory usage = ~70 MB.

Also: there is no need to use UseConcMarkSweepGC anymore. You can choose whichever GC you want or stick with a default one :)

Synchronising access to script objects might not the best option, but the only one I found that keeps Metaspace size within reasonable level. And even better - it keeps it constant. Still. It might not be the best solution for everyone but works great for us. We have big sets of tiny scripts which means this solution is (pretty much) scalable.

Let's see some STATS for GroovyEvaluatorLoadTest with GroovyEvaluator using:

  • old approach with shell.evaluate(expression):
0 iterations took 5.03 s
100 iterations took 285.185 s
200 iterations took 821.307 s
  • script.setBinding(binding):
0 iterations took 4.524 s
100 iterations took 19.291 s
200 iterations took 33.44 s
300 iterations took 47.791 s
400 iterations took 62.086 s
500 iterations took 77.329 s

So additional advantage is: it's lightning fast compared to previous, leaking solution ;)