I have C++ process that ingests large blocks of data and stores them in memory. The storage array contains roughly 10 GB of data partitioned into 4MB blocks. As new data arrives it creates a new block and then deletes an old block if it is full. This process cycles through the full circular buffer once every 10 - 60 seconds. We are running on an x86_64 RH5 and RH6 and compiling with the Intel 14 compiler.
We are seeing a problem where the overall process memory usage grows over time until the OS runs out of memory and eventually the box dies. We have been looking for memory leaks and running the process through TotalView trying to determine where the memory is going and are not seeing any reported leaks.
On the heap report produced by total view we saw the 10GB of allocated memory for the stored data, but we also saw 4+ GB of "deallocated" memory. Looking through the heap display, it appeared that our heap was very fragmented. There would be a large chunk of "allocated" memory interspersed with large chunks of "deallocated" memory.
Is the "deallocated" memory memory that has been freed by my process but not reclaimed by the OS and is it reasonable to think that this may be the source of our memory "leak"?
If so, how do I get the OS to reclaim the memory?
Do we need to rework our process to reuse discarded data blocks instead of relying on the OS to do our memory management for us?
I guess (and hope for you) that you are on Linux (if porting your code to Linux is doable, consider that since Linux has good tools for such issues).
Then:
use C++11 (or C++14) and learn about move semantics, smart pointers, and rule of five.
use valgrind
use some sanitizers from your recent GCC or Clang/LLVM compiler. Read about -fsanitizer=
... debugging options; you probably want -fsanitize=address
at least during debugging.
The above will help you catching some remaining memory leaks. Be prepared to spend weeks on them. You might need to disable ASLR and you should learn about gdb
watchpoints.
You might also consider using Boehm's conservative garbage collector. See this for using it in standard C++ containers. If you do use Boehm's GC you'll better use it nearly every where in your program ...
Genuine fragmentation may happen (even if you are sure to have avoided memory leaks, and have checked that e.g. with valgrind
), in particular for long lived processes. In such cases, you might consider having your own application checkpointing facilities (which are also useful to restart a long-lived computation). If you have thought about it early enough (checkpointing should be an early architectural design decision!) you could checkpoint your state to disk once in a while (e.g. every hour) and restart a fresh process. This can be a good memory compacting strategy.
You could (but I don't necessarily recommend) writing your own memory allocator above OS virtual address space changing primitives like mmap(2) (perhaps with MAP_HUGETLB
....) & munmap
; you might have your own allocator and deallocator (at least for large-sized objects, or have operator new
& operator delete
, etc..., in some of your classes), read about C++ allocator concept. But your standard new
and delete
(and malloc
& free
for C code, often used by C++ new
& delete
) is using them.
Notice that most free
or delete
do not invoke munmap
, but simply marks the released memory as reusable by future malloc
or new
...
You definitely should become more familiar with garbage collection techniques and terminology. Read the GC handbook.
See also mallinfo(3) & mallopt(3) & proc(5) (perhaps use /proc/self/maps
or /proc/self/smaps
& /proc/self/statm
from inside your program to learn about your heap, or the pmap
command). Maybe strace(1) could be useful (to understand what syscalls(2) happen)