在OpenCL的矩阵中的行还原(Reduction of matrix rows in OpenCL

2019-10-17 10:10发布

我有被存储为GPU维数组的矩阵,我试图让这将这个矩阵的每一行中减量使用,例如在OpenCL内核:

让我们考虑一下我的矩阵的2x3与元素[1,2,3,4,5,6,我想要做的是:

[1, 2, 3] = [ 6]
[4, 5, 6]   [15]

显然,因为我说的减少,实际收益可能会多于每行一个元素:

[1, 2, 3] = [3, 3]
[4, 5, 6]   [9, 6]

那么最终的计算,我可以在另一个内核或CPU在做。

好了,到目前为止,我有什么是内核里面做了减少,但使用数组的所有元素,就像这样:

[1, 2, 3] = [21]
[4, 5, 6]

实际减少内核这样做是一个(这是我从这里得到了计算器实际上):

__kernel void
sum2(__global float *inVector, __global float *outVector,
     const unsigned int inVectorSize, __local float *resultScratch)
{
  const unsigned int localId = get_local_id(0);
  const unsigned int workGroupSize = get_local_size(0);

  if (get_global_id(0) < inVectorSize)
    resultScratch[localId] = inVector[get_global_id(0)];
  else
    resultScratch[localId] = 0;

  for (unsigned int a = workGroupSize >> 1; a > 0; a >>= 1)
  {
    barrier(CLK_LOCAL_MEM_FENCE);
    if (a > localId)
      resultScratch[localId] += resultScratch[localId + a];
  }

  if (localId == 0)
    outVector[get_group_id(0)] = resultScratch[0];
  barrier(CLK_LOCAL_MEM_FENCE);
}

Answer 1:

我想一个解决办法是修改你的减少内核,所以它可以使减少阵列的一部分。

__kernel void
sum2(__global float *inVector,
     __global float *outVector,
     unsigned int   inVectorOffset,
     unsigned int   inVectorSize,
     __local float  *resultScratch)
{
  const unsigned int localId = get_local_id(0);
  const unsigned int workGroupSize = get_local_size(0);

  if (get_global_id(0) < inVectorSize)
    resultScratch[localId] = inVector[inVectorOffset + get_global_id(0)];
  else
    resultScratch[localId] = 0;

  for (unsigned int a = workGroupSize >> 1; a > 0; a >>= 1)
  {
    barrier(CLK_LOCAL_MEM_FENCE);
    if (a > localId)
      resultScratch[localId] += resultScratch[localId + a];
  }

  if (localId == 0)
    outVector[get_group_id(0)] = resultScratch[0];
  barrier(CLK_LOCAL_MEM_FENCE);
}

然后,你可以做一个还原矩阵的一行,提供如inVectorOffset你的行和列中元素的个数inVectorSize开始。



文章来源: Reduction of matrix rows in OpenCL