I have implemented the following class using CUDA streams
class CudaStreams
{
private:
int nStreams_;
cudaStream_t* streams_;
cudaStream_t active_stream_;
public:
// default constructor
CudaStreams() { }
// streams initialization
void InitStreams(const int nStreams = 1) {
nStreams_ = nStreams;
// allocate and initialize an array of stream handles
streams_ = (cudaStream_t*) malloc(nStreams_*sizeof(cudaStream_t));
for(int i = 0; i < nStreams_; i++) CudaSafeCall(cudaStreamCreate(&(streams_[i])));
active_stream_ = streams_[0];}
// default destructor
~CudaStreams() {
for(int i = 0; i<nStreams_; i++) CudaSafeCall(cudaStreamDestroy(streams_[i])); }
};
If I now run this simple code
void main( int argc, char** argv)
{
streams.InitStreams(1);
streams.~CudaStreams();
cudaDeviceReset();
}
after the cudaDeviceReset()
call, I receive the following message:
Unhandled exception 0x772f15de in test.exe: 0x00000000.
What should I do before invoking the destructor to avoid this issue when using cudaDeviceReset()
?
EDIT
If I add free(streams_);
in the destructor, namely
~CudaStreams() {
for(int i = 0; i<nStreams_; i++) CudaSafeCall(cudaStreamDestroy(streams_[i])); // *
free(streams_); }
I receive the following error message
cudaSafeCall() failed at C:\Users\Documents\Project\Library\CudaStreams.cuh:79 : unknown error
where line 79
is that denoted by *
in the destructor.
Furthermore, If I use the same instructions of the constructor and the destructor directly inside the code, namely
void main( int argc, char** argv)
{
int nStreams_ = 3;
cudaStream_t* streams_ = (cudaStream_t*) malloc(nStreams_*sizeof(cudaStream_t));
for(int i = 0; i < nStreams_; i++) CudaSafeCall(cudaStreamCreate(&(streams_[i])));
for(int i = 0; i<nStreams_; i++) CudaSafeCall(cudaStreamDestroy(streams_[i]));
free(streams_);
cudaDeviceReset();
}
everything works well. Perheps is something connected to a bad use of the class?
There are two problems here, both related to the destructor of your class and scope.
Firstly, let's start with a version of your
main()
which will work correctly:This works correctly because the destructor for
streams
is called exactly once (whenstreams
falls out of scope), and beforecudaDeviceReset
is called.Your original
main()
(or a compilable version of it, but more about that later...) fails for two reasons. Let's look at it again:Here you explicitly call the destructor for
streams
(which you should almost never do), thencudaDeviceReset
, then the destructor is called again at the return statement whenstreams
falls out of scope. The automatic calling the destructor after the context is destroyed is the source of the segfault/exception. ThecudaStreamDestroy
calls are trying to work on streams without a valid CUDA context. So the solution is not to have any classes which make CUDA API calls fall out of scope (or call their destructors explicitly) when there is no context.If we made a third version like this:
You will get a CUDA runtime error. Because the destructor gets call twice. The first time (explicit) it will work. The second (implict, out of scope) will produce a runtime error: you have a valid context, but are now trying to destroy non-existent streams.
As a final comment/question: How hard would it have been to post and actual compilable version of the code you showed in your original question? It literally required 5 extra lines to make it into a proper repro case someone else could actual compile and run. I find it a bit unreasonable to expect others to make a effort to answer what are basically debugging questions if you are not willing to make a similar effort in providing useful code and information which makes everyone's life that much easier. Think about it. [end of rant]