I am a bit confused about the usage of cudaEvent_t
. Currently, I am using the clock()
call like this to find the duration of a kernel call:
clock_t begin = clock();
fooKernel<<< x, y >>>( z, w );
clock_t end = clock();
// Print time difference: ( end - begin )
Looking for a timer of higher-resolution I am considering using cudaEvent_t
. Do I need to call cudaThreadSynchronize()
before I note down the time using cudaEventRecord()
or is it redundant?
The reason I am asking is because there is another call cudaEventSynchronize()
, which seems to wait until the event is recorded. If the recording is delayed, won't the time difference that is calculated show some extra time after the kernel has finished execution?
Actually there are even more synchronization functions (cudaStreamSynchronize
). The programming guide has a detailed description what every one of those does. Using events as timers basically comes down to this:
//create events
cudaEvent_t event1, event2;
//record events around kernel launch
cudaEventRecord(event1, 0); //where 0 is the default stream
kernel<<<grid,block>>>(...); //also using the default stream
cudaEventRecord(event2, 0);
cudaEventSynchronize(event1); //optional
cudaEventSynchronize(event2); //wait for the event to be executed!
//calculate time
float dt_ms;
cudaEventElapsedTime(&dt_ms, event1, event2);
It's important to synchronize on event2
because you want to make sure everything got executed before calculating the time. As both events and the kernel are on the same stream (order is preserved) event1
and kernel
got executed too.
You could call cudaStreamSynchronize
or even cudaThreadSynchronize
instead but both are overkill in this case.