I want to ask whether calling to cudaFree after some asynchronous calls is valid? For example
int* dev_a;
// prepare dev_a...
// launch a kernel to process dev_a (asynchronously)
cudaFree(dev_a);
In this case, since kernel launch is asynchronous, when the cudaFree part is reached, the kernel may haven't finish running yet. Then will the cudaFree(dev_a) immediately after it destroy the data?
As per Jared's comment, I am about 99% certain that the CUDA driver free/malloc pair are implemented as blocking calls which will synchronize the context on which they operate before they execute the call.