How do I find why the virtual memory foot print co

2019-06-08 17:43发布

问题:

I created a daemon which I use as a proxy to the Cassandra database. I call it snapdbproxy as it proxies my CQL commands from my other servers and tools.

Whenever I access that tool, it creates a new thread, manages various CQL commands, and then I cleanly exit the thread once the connection is lost.

Looking at the memory footprint, it grows really fast (the most active systems quickly reach Gb of virtual memory and that makes use of some swap memory which grows constantly.) On startup, it is around 300Mb.

The software is written in C++ with destructors, RAII, smart pointers, etc... but I still verified:

  1. With -fsanitizer=address (I use g++ under Linux) and I get no leaks (okay, a very few... under 300 bytes because I can't find how to get rid of a few Cryto buffers created by OpenSSL)

  2. With valgrind massif which says I use 4.7mB at initialization time and then under 4mB ongoing (I ran the same code for over 1h and same results!)

There is some output of ms_print (I removed the stack, since it's all zeroes).

-------------------------------------------------------------------
  n        time(i)         total(B)   useful-heap(B) extra-heap(B)
-------------------------------------------------------------------

  0              0                0                0             0
  1     78,110,172        4,663,704        4,275,532       388,172
  2    172,552,798        3,600,840        3,369,538       231,302
  3    269,590,806        3,611,600        3,379,648       231,952
  4    350,518,548        3,655,208        3,420,483       234,725
  5    425,873,410        3,653,856        3,419,390       234,466
...
 67  4,257,283,952        3,693,160        3,459,545       233,615
 68  4,302,665,173        3,694,624        3,460,827       233,797
 69  4,348,046,440        3,693,728        3,457,524       236,204
 70  4,393,427,319        3,685,064        3,449,697       235,367
 71  4,438,812,133        3,698,352        3,461,918       236,434

As we can see, after one hour and many accesses from various other daemons (at least 100 accesses,) valgrind tells me that I am using only around 4mB of memory. I tried twice thinking that the first attempt probably failed. Same results.

So... I'm more or less out of ideas. Why would my process continue to grow in terms of virtual memory even though everything is correctly freed on exit of each thread--as shown by massif output--and the entire process--as shown by -fsanitizer=address (okay, I'm not showing the output of the sanitizer here, but trust me, it's under 300 bytes. Not Gb of leaks.)


There is the output of a watch command after a while as I'm looking at the memory (Virtual Memory) status:

Every 1.0s: grep ^Vm /proc/1773/status       Tue Oct  2 21:36:42 2018

VmPeak:  1124060 kB   <-- starts at under 300 Mb...
VmSize:  1124060 kB
VmLck:         0 kB
VmPin:         0 kB
VmHWM:    108776 kB
VmRSS:    108776 kB
VmData:   963920 kB   <-- this tags along
VmStk:       132 kB
VmExe:      1936 kB
VmLib:     65396 kB
VmPTE:       888 kB   <-- this increases too (necessary to handle the large Vm)
VmPMD:        20 kB
VmSwap:        0 kB

The VmPeak, VmSize, and VmData all increase each time the other daemons run (about once every 5 min.)

However, the memory (malloc/free) is not changing. I am now logging sbrk(0) (on an idea by 1201ProgramAlarm's comment--my interpretation of the first part of his comment) and that address remains the same:

sbrk() = 0x4228000

As suggested by phd, I looked at t he contents of /proc/<pid>/maps over time. Here is one or two increment. Unfortunate that I'm not told what creates these buffers. The only thing I could think of are my threads... (i.e. stack and a little space for the thread status)

--- a1  2018-10-02 21:50:21.887583577 -0700
+++ a2  2018-10-02 21:52:04.823169545 -0700
@@ -522,6 +522,10 @@
 59dd0000-5a5d0000 rw-p 00000000 00:00 0 
 5a5d0000-5a5d1000 ---p 00000000 00:00 0 
 5a5d1000-5add1000 rw-p 00000000 00:00 0 
+5add1000-5add2000 ---p 00000000 00:00 0 
+5add2000-5b5d2000 rw-p 00000000 00:00 0 
+5b5d2000-5b5d3000 ---p 00000000 00:00 0 
+5b5d3000-5bdd3000 rw-p 00000000 00:00 0 
 802001000-802b8c000 rwxp 00000000 00:00 0 
 802b8c000-802b8e000 ---p 00000000 00:00 0 
 802b8e000-802c8e000 rwxp 00000000 00:00 0 

Oh... Yep! My latest changes from having detached threads to joining... actually doesn't join threads at all. Testing with the proper join now... and it works right! My! Bad one!