Creating a buffer with pow(2, 24)
and a local_size_x = 64
for the layout input qualifier will return WorkGroupID = 262143
which is all fine due to pow(2,24) / 64 - 1
, it is zero indexed.
However if we increase the global dimension / no elements / size of the problem to pow(2, 25)
lets say WorkGroupID
will return values without a reason, they do not match the math.
Here are some limits that the device got that I think matter:
maxStorageBufferRange: uint32_t = 4294967295
maxComputeSharedMemorySize: uint32_t = 32768
maxComputeWorkGroupCount: uint32_t[3] = 00000202898A8EC4
maxComputeWorkGroupCount[0]: uint32_t = 65535
maxComputeWorkGroupCount[1]: uint32_t = 65535
maxComputeWorkGroupCount[2]: uint32_t = 65535
maxComputeWorkGroupInvocations: uint32_t = 1024
maxComputeWorkGroupSize: uint32_t[3] = 00000202898A8ED4
maxComputeWorkGroupSize[0]: uint32_t = 1024
maxComputeWorkGroupSize[1]: uint32_t = 1024
maxComputeWorkGroupSize[2]: uint32_t = 1024
I do not go overboard with allocating more elements that the device supports. So after 2 days + 16 hrs I still did not figure out whats going on...
WorkGroupSize
, WorkGroupID
, LocalInvocationID
and GlobalInvocationID
presents the same problem when I reach a n no. of elements. It is no wonder that GlobalInvocationID
presents the same problem due to how it is calculated...
#version 450
// Size of the Local Work-group is defined trough input layout qualifier
layout(local_size_x = 64, local_size_y = 1, local_size_z = 1) in;
layout(set = 0, binding = 0) buffer deviceBuffer
{
uint x[];
};
void main() {
uint i = gl_GlobalInvocationID.x;
//uint i = gl_WorkGroupSize.x * gl_WorkGroupID.x * gl_LocalInvocationID.x;
//x[i] += x[i];
// Total No. of Work Items (threads) in Global Dimension
//x[i] = gl_NumWorkGroups.x;
// Size of Work Dimension specified in Input Layout Qualifier
//x[i] = gl_WorkGroupSize.x;
// Is given by Global Dimension / Work Group Size
x[i] = gl_WorkGroupID.x;
//x[i] = gl_LocalInvocationID.x;
}