In my project, I have implemented a custom memory allocator to avoid unneccessary calls to cudaMalloc
once the application has "warmed up". Moreover, I use custom kernels for basic array filling, arithmetic operations between arrays, etc. and would like to simplify my code by using Thrust
and getting rid of these kernels. Every array on the device is created and accessed through raw pointers (for now) and I'd like to use device_vector
and Thrust
s methods on these objects, but I find myself converting between raw pointers and device_ptr<>
all the time, somewhat cluttering up my code.
My rather vague question: How would/do you organize the usage of custom memory management, Thrust
s array methods and calls to custom kernels in the most readable way?
Like all standard c++ containers, you can customize how
thrust::device_vector
allocates storage by providing it with your own "allocator". By default,thrust::device_vector
's allocator isthrust::device_malloc_allocator
, which allocates (deallocates) storage withcudaMalloc
(cudaFree
) when Thrust's backend system is CUDA.Occasionally, it is desirable to customize the way
device_vector
allocates memory, such as in the OP's case, who would like to sub-allocate storage within a single large allocation performed at program initialization. This can avoid overhead which may be incurred by many individual calls to the underlying allocation scheme, in this case,cudaMalloc
.A simple way to provide
device_vector
a custom allocator is to inherit fromdevice_malloc_allocator
. One could in principle author an entire allocator from scratch, but with an inheritance approach, only theallocate
anddeallocate
member functions need to be provided. Once the custom allocator is defined, it can be provided todevice_vector
as its second template parameter.This example code demonstrates how to provide a custom allocator which prints a message upon allocation and deallocation:
Here's the output:
In this example, note that we hear from
my_allocator::allocate()
once uponvec.resize(10,13)
.my_allocator::deallocate()
is invoked once whenvec
goes out of scope as it destroys its elements.