I am using CUDA/Thrust/CUDPP. As I understand, in Stream compaction, certain items in an array are marked as invalid and then "removed".
Now what does "removal" really mean here? Suppose the original array A
and has length 6. If 2 elements are invalid (by whatever condition we may provide) then
Does the system create a new array of size 4 in GPU-memory to store the valid elements to get the final result?
OR does it physically remove the invalid elements from memory and shrink the original array A down to size 4 keeping only the valid elements?
For either case, doesn't that mean that dynamic memory allocation is happening under the hood? But I had heard that dynamic memory allocation is not possible in the CUDA world.