I'm working with CUDA and I have created an int2_
class to deal with complex integer numbers.
Class declarations in the ComplexTypes.h
file as follows:
namespace LibraryNameSpace
{
class int2_ {
public:
int x;
int y;
// Constructors
__host__ __device__ int2_(const int,const int);
__host__ __device__ int2_();
// etc.
// Equalities with other types
__host__ __device__ const int2_& operator=(const int);
__host__ __device__ const int2_& operator=(const float);
// etc.
};
}
Class implementations in the ComplexTypes.cpp
file as follows:
#include "ComplexTypes.h"
__host__ __device__ LibraryNameSpace::int2_::int2_(const int x_,const int y_) { x=x_; y=y_;}
__host__ __device__ LibraryNameSpace::int2_::int2_() {}
// etc.
__host__ __device__ const LibraryNameSpace::int2_& LibraryNameSpace::int2_::operator=(const int a) { x = a; y = 0.; return *this; }
__host__ __device__ const LibraryNameSpace::int2_& LibraryNameSpace::int2_::operator=(const float a) { x = (int)a; y = 0.; return *this; }
// etc.
Everything works well. In the main
(which includes ComplexTypes.h
) I could deal with int2_
numbers.
In the CudaMatrix.cu
file, I'm now including ComplexTypes.h
and defining and properly instantiating the __global__
function:
template <class T1, class T2>
__global__ void evaluation_matrix(T1* data_, T2* ob, int NumElements)
{
const int i = blockDim.x * blockIdx.x + threadIdx.x;
if(i < NumElements) data_[i] = ob[i];
}
template __global__ void evaluation_matrix(LibraryNameSpace::int2_*,int*,int);
The situation of the CudaMatrix.cu
file seems to be symmetric to the main
function. Nevertheless, the compiler complains:
Error 19 error : Unresolved extern function '_ZN16LibraryNameSpace5int2_aSEi' C:\Users\Documents\Project\Test\Testing_Files\ptxas simpleTest
Please, consider that:
- Before moving the implementation to separate files, everything was working correctly when including both declarations and implementations in the
main
file. - The problematic instruction is
data_[i] = ob[i]
.
Anyone has an idea of what is going on?
The procedure I have followed in my post above has two issues:
The
ComplexTypes.cpp
filename must be turned toComplexTypes.cu
so thatnvcc
could intercept the CUDA keywords__device__
and__host__
. This has been pointed out by Talonmies in his comment. Actually, before posting, I was already changing the filename from.cpp
to.cu
, but the compiler was complaining and showing the same error. Therefore, I was ingenuously stepping back;In Visual Studio 2010, one has to use View -> Property Pages; Configuration Properties -> CUDA C/C++ -> Common -> Generate Relocatable Device Code -> Yes (-rdc=true). This is necessary for separate compilation. Indeed, at NVIDIA CUDA Compiler Driver NVCC, it is said that: