kubernetes pod memory - java gc logs

2020-06-04 15:02发布

问题:

On the kubernetes dashboard, there's a pod wherein the Memory Usage (bytes) is displayed as 904.38Mi.

This pod holds the java app that was ran with -Xms512m -Xmx1024m, and on kubernetes deployment file -> requests.memory = 512M, limits.memory = 1.5G.

I've enabled gc logs and seeing these on the pod logs:

[2020-04-29T15:41:32.051+0000] GC(1533) Phase 1: Mark live objects
[2020-04-29T15:41:32.133+0000] GC(1533) Phase 1: Mark live objects 81.782ms
[2020-04-29T15:41:32.133+0000] GC(1533) Phase 2: Compute new object addresses
[2020-04-29T15:41:32.145+0000] GC(1533) Phase 2: Compute new object addresses 11.235ms
[2020-04-29T15:41:32.145+0000] GC(1533) Phase 3: Adjust pointers
[2020-04-29T15:41:32.199+0000] GC(1533) Phase 3: Adjust pointers 54.559ms
[2020-04-29T15:41:32.199+0000] GC(1533) Phase 4: Move objects
[2020-04-29T15:41:32.222+0000] GC(1533) Phase 4: Move objects 22.406ms
[2020-04-29T15:41:32.222+0000] GC(1533) Pause Full (Allocation Failure) 510M->127M(680M) 171.359ms
[2020-04-29T15:41:32.222+0000] GC(1532) DefNew: 195639K->0K(195840K)
[2020-04-29T15:41:32.222+0000] GC(1532) Tenured: 422769K->130230K(500700K)
[2020-04-29T15:41:32.222+0000] GC(1532) Metaspace: 88938K->88938K(1130496K)
[2020-04-29T15:41:32.228+0000] GC(1532) Pause Young (Allocation Failure) 603M->127M(614M) 259.018ms
[2020-04-29T15:41:32.228+0000] GC(1532) User=0.22s Sys=0.05s Real=0.26s

How did kubernetes arrived at 904.38Mi usage? If I understood correctly, the current usages are only:

DefNew (young) -      0k
Tenured        - 130230K
Metaspace      -  88938K
Sum            - 216168K

Running ps shows there are no other processes running on the pod aside from this java app.
Anyone can shed some light to this?

(edited) When the pod first started and let run for few mins, the memory usage is displayed as around 500mb only, then let the requests come in it will burst to 900mb-1gb, then when all has been processed, the memory usage on k8s dashboard doesn't go down below 900mb, even though based on GC logs, heap is GC'ed ok.

回答1:

There are a lot of things going on here. Let's take each at a time.

It seems you are using a single container per pod (though you could have many containers per pod). The requests.memory and limits.memory are specific to the container, kubernets computes limits and requests per pod as the sum of all the containers limitations.

So think about it - you are saying that a pod shows 904.38Mi, but you show requests.memory and limits.memory, which is per container. That is why I assume you have one container per pod. This is a general intro that does not answer your question - but we will get there.

Then comes the fact that a pod is started by docker, which is started with kubectl and that reads requires.memory and limits.memory. To make this a bit simpler : what you set in limits.memory, will be passed as docker -m. So, in your case, the total memory used for the docker process is 1.5GC. Remember that is the entire process limit, not just the heap. A java process is a lot more than the heap, which you specify with -Xms512m -Xmx1024m. So to answer your question:

How did kubernetes arrived at 904.38Mi usage?

This is what the entire process is currently taking, not just the heap. From the very short log files that you posted - your application is just fine.

EDIT

I actually did not had kubernetes dashboard in my environment to test this specifically, so had to install it to really understand what is going on. I had a hint against the majority of things, but to make sure, I did some tests.

First things first: what does that number in the dashboard mean? Took a while to find/understand, but that is the actual resident memory of the process, which actually is a very good thing.

Any sane OS knows that when someone requests memory from it, it rarely needs/utilizes it all, as such, it gives the memory to it in a lazy fashion. This is easy to prove in k8s. Suppose I have a jdk-13 JVM and start it with:

kubectl run jdk-13 
    --image=jdk-13 
    --image-pull-policy=Never 
    --limits "memory=100Mi" 
    --requests "memory=10Mi" 
    --command -- /bin/sh -c "while true; do sleep 5; done".

Notice requests.memory=10Mi and limits.memory=100Mi. Reading the answer from the beginning, you already know that the specific pod will be started with docker -m 100m... because limits.memory=100Mi. That is easy to prove, just sh into the pod:

 kubectl exec -it jdk-13-b8d656977-rpzrg -- /bin/sh

and find out what the cgroup says:

 # cat /sys/fs/cgroup/memory/memory.limit_in_bytes
 104857600 // 100MB

perfect! so the pod's memory limitation is 100 MB max, but what is the current memory utilization, that is what is the resident memory taken?

kubectl top pod
   NAME                          CPU(cores)   MEMORY(bytes)
   jdk-13-b8d656977-rpzrg           1m           4Mi

OK, so the current memory utilization is only 4MB. You can "make sure" that this is indeed accurate, if you do:

kubectl exec -it jdk-13-b8d656977-rpzrg -- /bin/sh

and inside that pod issue:

top -o %MEM

and notice that RES memory is on par with the one reported via dashboard or kubectl top pod.

And now let's do a test. Suppose I have this very simple code in that pod:

// run this with: java "-Xlog:gc*=debug" -Xmx100m -Xms20m  HeapTest
import java.time.temporal.ChronoUnit;
import java.util.Arrays;
import java.util.concurrent.TimeUnit;
import java.util.concurrent.locks.LockSupport;

public class HeapTest {

    public static void main(String[] args) throws Exception {

        // allocate 1 MB every 3 seconds
        for (int i = 0; i < 40; ++i) {
            byte[] b = new byte[1024 * 1024 * 1];
            b[i] = 1;
            System.out.println(Arrays.hashCode(b));
            LockSupport.parkNanos(TimeUnit.of(ChronoUnit.SECONDS).toNanos(3));
        }
    }
}

I allocate 1MB every 3 seconds for around 2 minutes. When I look at this process in dashboard, I do see that at some point in time, the memory grows. After the program ends, dashboard reports the drop in memory back. Good! That means the memory is given back and RSS memory drops. This is how it looks like in dashboard:

Now let's change this code just a bit. Let's add some GC in there and let's never finish this process (you know like typical spring-boot applications do):

import java.time.temporal.ChronoUnit;
import java.util.Arrays;
import java.util.concurrent.TimeUnit;
import java.util.concurrent.locks.LockSupport;

public class HeapTest {

    public static void main(String[] args) throws Exception {

        // allocate 1 MB every 3 seconds
        for (int i = 0; i < 40; ++i) {
            byte[] b = new byte[1024 * 1024 * 1];
            b[i] = 1;
            System.out.println(Arrays.hashCode(b));
            LockSupport.parkNanos(TimeUnit.of(ChronoUnit.SECONDS).toNanos(3));
        }
        for (int i = 0; i < 10; i++) {
            Thread.sleep(500);
            System.gc();
        }

        while (true) {
            try {
                Thread.sleep(TimeUnit.of(ChronoUnit.SECONDS).toMillis(5));
                Thread.onSpinWait();
            } catch (Exception e) {
                throw new RuntimeException(e);
            }
        }

    }
}

I run this with:

java "-Xlog:heap*=debug" 
     "-Xlog:gc*=debug" 
     "-Xlog:ergo*=debug" 
     -Xmx100m 
     -Xms20m
     HeapTest

on inspecting logs (just like in your example), I do see that heap is collected just fine. But when I look at the dashboard, the memory does not drop (unlikes the previous example).

Once G1GC takes memory, it is not very eager to give it back to the OS. It can do that in rare cases, here is one example or you can instruct it to do so.

Both ways are rather painful, instead there are GC algorithms that are smarter (and in general a lot better). My personal love goes to Shenandoah, let's see what it does. If I slightly change the code (so that I could prove my point better):

import java.time.temporal.ChronoUnit;
import java.util.Arrays;
import java.util.concurrent.TimeUnit;
import java.util.concurrent.locks.LockSupport;

public class HeapTest {

    public static void main(String[] args) throws Exception {

        // allocate 1/4 MB every 100 ms
        for (int i = 0; i < 6000; ++i) {
            byte[] b = new byte[1024 * 256];
            b[i] = 1;
            System.out.println(Arrays.hashCode(b));
            LockSupport.parkNanos(TimeUnit.of(ChronoUnit.MILLIS).toNanos(100));
        }

        while (true) {
            try {
                Thread.sleep(TimeUnit.of(ChronoUnit.SECONDS).toMillis(5));
                Thread.onSpinWait();
            } catch (Exception e) {
                throw new RuntimeException(e);
            }
        }

    }
}

And run it with:

 java "-Xlog:gc*=debug" 
      "-Xlog:ergo*=debug" 
      "-Xlog:heap*=debug" 
       -XX:+UnlockExperimentalVMOptions 
       -XX:+UseShenandoahGC 
       -XX:+ShenandoahUncommit 
       -XX:ShenandoahGCHeuristics=compact  
       -Xmx1g 
       -Xms1m  
       HeapTest

Here is what you are going to see:

And you should, for a fact care about this:

This behavior is particularly disadvantageous in container environments where resources are paid by use. Even during phases where the VM only uses a fraction of its assigned memory resources due to inactivity, G1 will retain all of the Java heap. This results in customers paying for all resources all the time, and cloud providers not being able to fully utilize their hardware.

P.S. I would also add to that the fact that other pods are suffering too, because one pod decided to take as much memory as it could, at a particular spike, and never give it back.



回答2:

GC deals with a subset of memory used by process. There are regions of JVM memory which are not subject for garbage collection.

Below are few memory areas are not included in heap / metaspace

  • Thread Stack Space
  • Compressed Class Space
  • JIT compiled code
  • NIO direct buffer memory

List above is not full, these are just largest memory consumers.

Here is JVM memory hierarchy diagram with related config options.

In summary, actual JVM memory appetite is always large than heap limit.

How much large depends on nature of application and could be established empirically.

UPDATE

Java Native Memory Tracking could be enabled in JVM to provide detailed reports related to memory usage across different functional areas.