When I run my CUDA program which allocates only a small amount of global memory (below 20 M), I got a "out of memory" error. (From other people's posts, I think the problem is related to memory fragmentation) I try to understand this problem, and realize I have a couple of questions related to CUDA memory management.
Is there a virtual memory concept in CUDA?
If only one kernel is allowed to run on CUDA simultaneously, after its termination, will all of the memory it used or allocated released? If not, when these memory got free released?
If more than one kernel are allowed to run on CUDA, how can they make sure the memory they use do not overlap?
Can anyone help me answer these questions? Thanks
Edit 1: operating system: x86_64 GNU/Linux CUDA version: 4.0 Device: Geforce 200, It is one of the GPUS attached to the machine, and I don't think it is a display device.
Edit 2: The following is what I got after doing some research. Feel free to correct me.
CUDA will create one context for each host thread. This context will keep information such as what portion of memory (pre allocated memory or dynamically allocated memory) has been reserved for this application so that other application can not write to it. When this application terminates (not kernel) , this portion of memory will be released.
CUDA memory is maintained by a link list. When an application needs to allocate memory, it will go through this link list to see if there is continuous memory chunk available for allocation. If it fails to find such a chunk, a "out of memory" error will report to the users even though the total available memory size is greater than the requested memory. And that is the problem related to memory fragmentation.
cuMemGetInfo will tell you how much memory is free, but not necessarily how much memory you can allocate in a maximum allocation due to memory fragmentation.
On Vista platform (WDDM), GPU memory virtualization is possible. That is, multiple applications can allocate almost the whole GPU memory and WDDM will manage swapping data back to main memory.
New questions: 1. If the memory reserved in the context will be fully released after the application has been terminated, memory fragmentation should not exist. There must be some kind of data left in the memory. 2. Is there any way to restructure the GPU memory ?