This is a follow up question to the selected answer in this post: Output of cuda program is not what was expected.
While the below functions works:
__global__ void setVal(char **word)
{
char *myWord = word[(blockIdx.y * gridDim.x) + blockIdx.x];
myWord[0] = 'H';
myWord[1] = 'e';
myWord[2] = 'l';
myWord[3] = 'l';
myWord[4] = 'o';
}
Why does not this work?
__global__ void setVal(char **word)
{
char *myWord = word[(blockIdx.y * gridDim.x) + blockIdx.x];
myWord = "Hello\0";
}
You should start paying much more attention to the output from the compiler. Your second kernel code:
compiles to a null kernel with nothing inside it:
The reason why is because what you think is a string copy assignment is really just a pointer assignment, and in this case the compiler is smart enough to know that myWord isn't written to memory, so it just eliminates all the code and warns you that myWord isn't used.
If I were to ask a rhetorical question and re-write the code this way:
would be more obvious both why the code doesn't compile and why it could never "implicitly" perform a string copy assignment even if it did compile?
In your second version,
myWord = "Hello\0";
, the"Hello\0"
is not stored in the space given by the**word
parameter. The string is stored probably in the.rodata
section of the executable. The assignment simply updates themyWord
pointer -- it does NOT do any bulk copying of data. (Though as talonmies points out, the compiler can figure out that the pointer update isn't needed at all, and optimizes away the entire function. Neat.)In general, C doesn't provide any easy bulk-data copy mechanisms built into the language -- the designers thought expensive things should look expensive. So, while PL/I makes assigning
0
to every element in a multidimensional array a very easy operation:A = 0;
, C forces nestedfor()
loops withmemset()
operations in the inner-most loop, to drive home the idea that it is expensive.(Copying
struct
elements into a function parameter is the only exception to the bulk-copy rule.)