How to correctly identify and correct a memory lea

2019-07-15 06:02发布

问题:

we run a debian server with 64Gb of RAM to run large python simulations.

The problem we face is that a large amount of this memory is getting used and we don't know why or how to correct that.

It appears it is not a cache/buffer thing:

free -m
             total       used       free     shared    buffers     cached
Mem:         64454      56243       8211         20          6        113
-/+ buffers/cache:      56122       8332
Swap:        21051       5834      15217

When running smem, it shows us that after a few days, up to 37 Gb are allocated for the kernel dynamic memory.

Area                           Used      Cache   Noncache
firmware/hardware                 0          0          0
kernel image                      0          0          0
kernel dynamic memory         36.8G     431.0M      36.4G
userspace memory               4.5G     149.7M       4.4G
free memory                   21.6G      21.6G          0
----------------------------------------------------------
                              62.9G      22.2G      40.8G

We rebooted the server yesterday, and while a the start it shows a kernel dynamic memory of 1.5 Gb, it slowly increases.

24 hours later, it has already reached 17Gb

Area                           Used      Cache   Noncache 
firmware/hardware                 0          0          0 
kernel image                      0          0          0 
kernel dynamic memory         17.1G     269.3M      16.8G 
userspace memory              36.4G      73.0M      36.3G 
free memory                    9.4G       9.4G          0 
----------------------------------------------------------
                              62.9G       9.8G      53.2G 

Any idea how to investigate further and if this is really a memory leak, what should we do? (kernel is 3.16)

Thanks in advance