I am trying to solve a problem in which i am supposed to change a colour image to a greyscale image. For this purpose i am using CUDA parallel approach.
The kerne code i am invoking on the GPU is as follows.
__global__
void rgba_to_greyscale(const uchar4* const rgbaImage,
unsigned char* const greyImage,
int numRows, int numCols)
{
int absolute_image_position_x = blockIdx.x;
int absolute_image_position_y = blockIdx.y;
if ( absolute_image_position_x >= numCols ||
absolute_image_position_y >= numRows )
{
return;
}
uchar4 rgba = rgbaImage[absolute_image_position_x + absolute_image_position_y];
float channelSum = .299f * rgba.x + .587f * rgba.y + .114f * rgba.z;
greyImage[absolute_image_position_x + absolute_image_position_y] = channelSum;
}
void your_rgba_to_greyscale(const uchar4 * const h_rgbaImage,
uchar4 * const d_rgbaImage,
unsigned char* const d_greyImage,
size_t numRows,
size_t numCols)
{
//You must fill in the correct sizes for the blockSize and gridSize
//currently only one block with one thread is being launched
const dim3 blockSize(numCols/32, numCols/32 , 1); //TODO
const dim3 gridSize(numRows/12, numRows/12 , 1); //TODO
rgba_to_greyscale<<<gridSize, blockSize>>>(d_rgbaImage,
d_greyImage,
numRows,
numCols);
cudaDeviceSynchronize(); checkCudaErrors(cudaGetLastError());
}
i see a line of dots in the first pixel line.
error i am getting is
libdc1394 error: Failed to initialize libdc1394
Difference at pos 51 exceeds tolerance of 5
Reference: 255
GPU : 0
my input/output images
Can anyone help me with this??? thanks in advance.
I don't think that this is a CUDA problem. libdc1394 is a library used to access IEEE1394 aka FireWire aka iLink video devices (DV camcorders, Apple iSight camera). That library doesn'r properly initialize, hence you're not getting usefull results. Basically it's NINO: Nonsens In Nonsens Out.
Since you are not aware of the image size. It is best to choose any reasonable dimension of the two-dimensional block of threads and then check for two conditions. The first one is that the
pos_x
andpos_y
indexes in the kernel do not exceednumRows
andnumCols
. Secondly the grid size should be just above the total number of threads in all the blocks.The libdc1394 error is not related to firewire etc in this case - it is the library that udacity is using to compare the image your program creates to the reference image. And what is is saying is that the difference between your image and the reference image has been been exceeded by a specific threshold, for that position ie. pixel.
Now, since I posted this question I have been continuously working on this problem
there are a couple of improvements that should be done in order to get this problem correct now I realize my initial solution was wrong .
Changes to be done:-
Secondly,
In the solution we are using a grid of numCols/16 * numCols/16
and blocksize of 24 * 24
code executed in 0.040576 ms
@datenwolf : thanks for answering above!!!
1-
int x =(blockIdx.x * blockDim.x) + threadIdx.x;
2-
int y = (blockIdx.y * blockDim.y) + threadIdx.y;
And in grid and block size
1-
const dim3 blockSize(32, 32, 1);
2-
const dim3 gridSize((numCols/32+1), (numRows/32+1) , 1);
Code executed in 0.036992 ms.