This question is very much related to:
A) How to separate CUDA code into multiple files
B) Link error LNK2005 when trying to compile several CUDA files together
Following advice from here: https://meta.stackexchange.com/questions/42343/same-question-but-not-quite and here https://meta.stackexchange.com/questions/8910/asking-a-similar-but-not-the-same-question
I am asking a very similar question but I want to be absolutely clear about where is the difference between my question and the questions linked above.
I was getting the linker errors from the title when including a header file, which contained the definition of a __device__
function, into multiple source files.
This is different from Link A) where the same errors occur with __kernel__
functions because __device__
according to the CUDA manual implies inline
:
In device code compiled for devices of compute capability 1.x, a
__device__
function is always inlined by default. The__noinline__
function qualifier however can be used as a hint for the compiler not to inline the function if possible (see Section E.1).
Link B) is more related (and one answer correctly points out that it seems not to get inlined no matter what the manual says) but link B) refers to a header shipped by NVIDIA rather than a own header so while the problem is most likely to lie within my header file, it is most unlikely to lie within a NVIDIA header file. In other words it is likely that Link B) and my questions have different answers.
In the meantime I have found out that declaring a function as __device__ inline
solves the problem so the above is only to document the solution for the rest of the world.
The open question is the reason for that behaviour.
Possible explanations I came up with:
- The manual is wrong
nvcc -arch=compute_11
does not qualify as "compiling for devices of compute capability 1.x" or there is a bug in nvcc- this is MS-VS specific and does work on platforms tested by NVIDIA
- I have a severe misconception about how
inline
works. A non cuda related example ca ne found here: Multiply defined linker error using inlined functions My understanding is the one expressed by "caf" there that "the compiler shouldn't generate an external definition of the function, so it shouldn't bother the linker" others over there seemed to disagree.
I'd greatly apprechiate if someone with more insght could clarify what is happening here.