Analyze Kubernetes pod OOMKilled

2020-06-20 09:51发布

问题:

We got OOMKilled event on our K8s pods. We want in case of such event to run Native memory analysis command BEFORE the pod is evicted. Is it possible to add such a hook?

Being more specific: we run with -XX:NativeMemoryTracking=summary JVM flag. We want to run jcmd <pid> VM.native_memory summary.diff just BEFORE pod eviction to see what causes OOM.

回答1:

Looks like it is almost impossible to handle.

Based on an answer on Github about a gracefully stop on OMM Kill:

It is not possible to change OOM behavior currently. Kubernetes (or runtime) could provide your container a signal whenever your container is close to its memory limit. This will be on a best effort basis though because memory spikes might not be handled on time.

Here is from official documentation:

If the node experiences a system OOM (out of memory) event prior to the kubelet is able to reclaim memory, the node depends on the oom_killer to respond. The kubelet sets a oom_score_adj value for each container based on the quality of service for the Pod.

So, as you understand, you have not much chance to handle it somehow. Here is the large article about the handling of OOM, I will take just a small part here, about memory controller out of memory handling:

Unfortunately, there may not be much else that this process can do to respond to an OOM situation. If it has locked its text into memory with mlock() or mlockall(), or it is already resident in memory, it is now aware that the memory controller is out of memory. It can't do much of anything else, though, because most operations of interest require the allocation of more memory.

The only thing I can offer is getting a data from cAdvisor (here you can get an OOM Killer event) or from Kubernetes API and run your command when you see by metrics that you are very close to out of memory. I am not sure that you will have a time to do something after you will get OOM Killer event.



标签: kubernetes