I consider writing Haskell bindings to a quantum mechanics library written in C++ (I'd write a plain C wrapper) and CUDA. A major bottleneck is always the GPU memory used by the CUDA parts. In C++, this is handled quite efficiently because all objects have automatic memory management, i.e. are erased as soon as they leave scope. Also I use C++11 move semantics to avoid copies, those obviously wouldn't be necessary in Haskell anyway.
Yet I'm concerned it might not work as smoothly anymore when the objects are managed from garbage-collected Haskell, and I might need to come up with heuristics to migrate seldom-used objects back to host memory (which tends to be quite slow). Is this fear reasonable or is the GHC garbage collection so effective that most objects will vanish almost as quickly as in C++, even when the Haskell runtime doesn't see it needs to be economic on memory? Are there any tricks to help, or ways to signal that some objects take up too much GPU memory and should be removed as quickly as possible?