cuda, pycuda — how to write complex numbers — erro

2019-08-22 07:41发布

问题:

I have difficulties to use complex numbers in cuda,pycuda.

I have this in C:

#include <complex>
typedef std::complex<double> cmplx;
....
cmplx j(0.,1.);   

Also,in the same code:

#include <boost/python.hpp>
#include <boost/array.hpp>
...
typedef std::vector< boost::array<std::complex<double>,3 > > ComplexFieldType;
typedef std::vector< boost::array<double,3> > RealFieldType;
...
__global__ void compute(RealFieldType const & Rs,ComplexFieldType const & M,..)
...

How can i convert this to use it with pycuda? I tried sth like this (according to the book 'cuda by an example'):

struct cuComplex {
    float real;
    float imag;
    cuComplex(float a,float b): real(a),imag(b){} 
    cuComplex operator *(const cuComplex& a) {
    return cuComplex(real*a.real -imag*a.imag ,imag*a.real +real*a.imag);
    }
cuComplex operator +(const cuComplex& a) {
    return cuComplex(real+a.real ,imag+a.imag);
    };  

cuComplex j(0.,1.);    //instead of  cmplx j(0.,1.);  

 __global__ void compute(float *Rs,cuComplex * M,..)  //instead of RealFieldType const & Rs,ComplexFieldType const & M
....

Some of the errors i take are:

data member initializer is not allowed

this declaration has no storage class or type specifier

Thank you!

---------------------EDIT----------------------------------------------

I did the following using #include <pycuda-complex.hpp> (relative to the above) :

pycuda::complex<float> cmplx;

cmplx j(0.,1.);

and as for typedef std::vector< boost::array<std::complex<double>,3 > > ComplexFieldType;

and ComplexFieldType const & M ,inside the global function, i tried just "float *M " or "cmplx *M".

Until now , i am getting error :

variable "cmplx" is not a type name

If i use pycuda::complex cmplx; ,then i get:

identifier "cmplx" is undefined

name followed by "::" must be a class or namespace name

Also:

expression must have pointer-to-object type (but maybe this is from another part of code)

回答1:

It really isn't clear what you are actually trying to do (if you actually know yourself), and the question is getting progressively more confused as the edits and comments roll on. But to expand Andreas's answer a little, here is a simple, compilable piece of CUDA code which uses the pycuda native complex type correctly:

#include <pycuda-complex.hpp>

template<typename T>
__global__ void kernel(const T * x, const T *y, T *z)
{
    int tid = threadIdx.x + blockDim.x * blockIdx.x;

    z[tid] = x[tid] + y[tid];
}


typedef pycuda::complex<float> scmplx;
typedef pycuda::complex<double> dcmplx;

template void kernel<float>(const float *, const float *, float *);
template void kernel<double>(const double *, const double *, double *);
template void kernel<scmplx>(const scmplx *, const scmplx *, scmplx *);
template void kernel<dcmplx>(const dcmplx *, const dcmplx *, dcmplx *);

This gives you single and double real and complex versions of the trivial kernel and compiles with nvcc something like this:

$ nvcc -arch=sm_20 -Xptxas="-v" -I$HOME/pycuda-2011.1.2/src/cuda -c scmplx.cu 
ptxas info    : Compiling entry function '_Z6kernelIN6pycuda7complexIdEEEvPKT_S5_PS3_' for 'sm_20'
ptxas info    : Used 12 registers, 44 bytes cmem[0], 168 bytes cmem[2], 4 bytes cmem[16]
ptxas info    : Compiling entry function '_Z6kernelIN6pycuda7complexIfEEEvPKT_S5_PS3_' for 'sm_20'
ptxas info    : Used 8 registers, 44 bytes cmem[0], 168 bytes cmem[2]
ptxas info    : Compiling entry function '_Z6kernelIdEvPKT_S2_PS0_' for 'sm_20'
ptxas info    : Used 8 registers, 44 bytes cmem[0], 168 bytes cmem[2]
ptxas info    : Compiling entry function '_Z6kernelIfEvPKT_S2_PS0_' for 'sm_20'
ptxas info    : Used 4 registers, 44 bytes cmem[0], 168 bytes cmem[2]

Perhaps this goes someway to answering your question....



回答2:

Use

#include <pycuda-complex.hpp>

{
  pycuda::complex<float> x(5, 17);
}

Same interface as std::complex<>, in fact derived from the STLport version of that.