I am trying to convert a rgba buffer into argb, is there any way to improve the next algorithm, or any other faster way to perform such operation? Taking into account that the alpha value is not important once in the argb buffer, and should always end up as 0xFF.
int y, x, pixel;
for (y = 0; y < height; y++)
{
for (x = 0; x < width; x++)
{
pixel = rgbaBuffer[y * width + x];
argbBuffer[(height - y - 1) * width + x] = (pixel & 0xff00ff00) | ((pixel << 16) & 0x00ff0000) | ((pixel >> 16) & 0xff);
}
}
Assuming that the code is not buggy (just inefficient), I can guess that all you want to do is swap every second (even-numbered) byte (and of course invert the buffer), isn't it?
So you can achieve some optimizations by:
I would rewrite the code as follows:
Please note that the more complex indices calculation is performed in the outer loop only. There are four acesses to both rgbaBuffer and argbBuffer for each pixel, but I think this is more than offset by avoiding the bitwise operations and the indixes calculations. An alternative would be (like in your code) fetch/store one pixel (int) at a time, and make the processing locally (this econimizes in memory accesses), but unless you have some efficient way to swap the two bytes and set the alpha locally (eg some inline assembly, so that you make sure that everything is performed at registers level), it won't really help.
I will focus only in the swap function:
Use assembly, the following is for Intel.
This example swaps Red and Blue.
Code you provided is very strange since it shuffles color components not rgba->argb, but rgba->rabg.
I've made a correct and optimized version of this routine.
The first thing i've made is simplifying your shuffling expression. It is obvious that XRGB is just RGBA >> 8. Also i've removed calculation of array index on each iteration and used pointers as loop variables. This version is about 2 times faster than the original on my machine.
You can also use SSE for shuffling if this code is intended for x86 CPU.