My application is running in a docker container, it use scala and use "OpenJDK 64-Bit Server VM (build 25.102-b14, mixed mode)", its Xmx is set to be 16GB and container memory limit is 24Gb, after running for some time the container is killed:
Last State: Terminated
Reason: OOMKilled
Exit Code: 137
However I can't find any "java.lang.OutOfMemoryError: Java heap space" errors in the log, not even once in last 2 weeks in all 48 nodes. So it's not likely a normal heap OOM.
dmesg output:
$ dmesg -l err,crit,alert,emerg
STDIN is not a terminal
[1647254.978515] Memory cgroup out of memory: Kill process 10924 (java) score 1652 or sacrifice child
[1647254.989138] Killed process 10924 (java) total-vm:34187148kB, anon-rss:24853120kB, file-rss:23904kB
[1655749.664871] Memory cgroup out of memory: Kill process 1969 (java) score 1652 or sacrifice child
[1655749.675513] Killed process 1969 (java) total-vm:35201940kB, anon-rss:24856624kB, file-rss:24120kB
[1655749.987605] Memory cgroup out of memory: Kill process 2799 (java) score 1656 or sacrifice child
I then run JCMD multiple times before it is killed again and the data looks like the following: Native Memory Tracking:
Total: reserved=25505339KB, committed=25140947KB - Java Heap (reserved=16777216KB, committed=16777216KB) (mmap: reserved=16777216KB, committed=16777216KB)
Class (reserved=247996KB, committed=93500KB) (classes #14539) (malloc=2236KB #29794) (mmap: reserved=245760KB, committed=91264KB)
Thread (reserved=1013160KB, committed=1013160KB) (thread #1902) (stack: reserved=1003956KB, committed=1003956KB) (malloc=6240KB #9523) (arena=2964KB #3803)
Code (reserved=263255KB, committed=86131KB) (malloc=13655KB #20964) (mmap: reserved=249600KB, committed=72476KB)
GC (reserved=776174KB, committed=776174KB) (malloc=120814KB #164310) (mmap: reserved=655360KB, committed=655360KB)
Compiler (reserved=812KB, committed=812KB) (malloc=681KB #1823) (arena=131KB #3)
Internal (reserved=6366260KB, committed=6366256KB) (malloc=6366256KB #178778) (mmap: reserved=4KB, committed=0KB)
Symbol (reserved=18391KB, committed=18391KB) (malloc=16242KB #153138) (arena=2150KB #1)
Native Memory Tracking (reserved=9002KB, committed=9002KB) (malloc=186KB #2000) (tracking overhead=8816KB)
Arena Chunk (reserved=273KB, committed=273KB) (malloc=273KB)
Unknown (reserved=32800KB, committed=32KB) (mmap: reserved=32800KB, committed=32KB)
One thing I noticed is this section: Internal (reserved=6366260KB, committed=6366256KB)
It keeps growing and causing total memory usage to exceed 24GB limit.
Anyone has seen similar issue before? and anyone knows what is Internal memory here and what could be the reason that it keeps growing without releasing the memory?