I am trying to set a flag in one kernel function and read it in another. Basically, I'm trying to do the following.
#include <iostream>
#include <cuda.h>
#include <cuda_runtime.h>
#define FLAGCLEAR 0
#define FLAGSET 1
using namespace std;
__global__ void set_flag(int *flag)
{
*flag = FLAGSET;
// Wait for flag to reset.
while (*flag == FLAGSET);
}
__global__ void read_flag(int *flag)
{
// wait for the flag to set.
while (*flag != FLAGSET);
// Clear it for next time.
*flag = FLAGCLEAR;
}
int main(void)
{
// Setup memory for flag
int *flag;
cudaMalloc(&flag, sizeof(int));
// Setup streams
cudaStream_t stream0, stream1;
cudaStreamCreate(&stream0);
cudaStreamCreate(&stream1);
// Print something to let me know that we started.
cout << "Starting the flagging" << endl;
// do the flag test
set_flag <<<1,1,0,stream0>>>(flag);
read_flag <<<1,1,0,stream1>>>(flag);
// Wait for the streams
cudaDeviceSynchronize();
// Getting here is a painful process!
cout << "Finished the flagging" << endl;
// Clean UP!
cudaStreamDestroy(stream0);
cudaStreamDestroy(stream1);
cudaFree(flag);
}
I eventually get the second printout, but only after the computer freezes for 15 seconds, and I get both printouts at the same time. These streams are supposed to run in parallel, and not bog the system down. What am I doing wrong? How can I fix this?
Thanks.
EDIT
It seems as though a special case has been solved by adding volitile
but now something else has broken. If I add anything between the two kernel calls, the system reverts back to the old behavior, namely freezing and printing everything at once. This behavior is shown by adding sleep(2);
between set_flag
and read_flag
. Also, when put in another program, this causes the GPU to lock up. What am I doing wrong now?
Thanks again.