I try to use cuda python with numba. The code is to calculate the sum of a 1D array as follows, but I don't know how to get one value result rather than three values.
python3.5 with numba + CUDA8.0
import os,sys,time
import pandas as pd
import numpy as np
from numba import cuda, float32
os.environ['NUMBAPRO_NVVM']=r'D:\NVIDIA GPU Computing Toolkit\CUDA\v8.0\nvvm\bin\nvvm64_31_0.dll'
os.environ['NUMBAPRO_LIBDEVICE']=r'D:\NVIDIA GPU Computing Toolkit\CUDA\v8.0\nvvm\libdevice'
bpg = (1,1)
tpb = (1,3)
def calcu_sum(D,T):
ty = cuda.threadIdx.y
bh = cuda.blockDim.y
index_i = ty
L = len(D)
su = 0
while index_i<L:
su +=D[index_i]
index_i +=bh
D = np.array([ 0.42487645,0.41607881,0.42027071,0.43751907,0.43512794,0.43656972,
0.43940639,0.43864551,0.43447691,0.43120232], dtype=np.float32)
T = np.empty([1,1])
print('D: ',D)
stream = cuda.stream()
with stream.auto_synchronize():
dD = cuda.to_device(D, stream)
dT= cuda.to_device(TE, stream)
calcu_sum[bpg, tpb, stream](dD,dT)
The output is:
D: [ 0.42487645 0.41607881 0.42027071 0.43751907 0.43512794 0.43656972
0.43940639 0.43864551 0.43447691 0.43120232]
su: 1.733004
su: 1.289852
su: 1.291317
T: 1.733004
T: 1.289852
T: 1.291317
Why can't I get the output "4.31417383" rather than "1.733004 1.289852 1.291317" ? 1.733004+1.289852+1.291317=4.314173.
I'm new to numba, read the numba documentation, but don't know how to do it. Can someone give advice ?