Fast Converting RGBA to ARGB

2019-01-27 01:43发布

I am trying to convert a rgba buffer into argb, is there any way to improve the next algorithm, or any other faster way to perform such operation? Taking into account that the alpha value is not important once in the argb buffer, and should always end up as 0xFF.

int y, x, pixel;

for (y = 0; y < height; y++)
{
    for (x = 0; x < width; x++)
    {
     pixel = rgbaBuffer[y * width + x];
     argbBuffer[(height - y - 1) * width + x] = (pixel & 0xff00ff00) | ((pixel << 16) & 0x00ff0000) | ((pixel >> 16) & 0xff);
    }
}

4条回答
我想做一个坏孩纸
2楼-- · 2019-01-27 02:05

Assuming that the code is not buggy (just inefficient), I can guess that all you want to do is swap every second (even-numbered) byte (and of course invert the buffer), isn't it?

So you can achieve some optimizations by:

  • Avoiding the shift and masking operations
  • Optimizing the loop, eg economizing in the indices calculations

I would rewrite the code as follows:

int y, x;

for (y = 0; y < height; y++)
{
    unsigned char *pRGBA= (unsigned char *)(rgbaBuffer+y*width);
    unsigned char *pARGB= (unsigned char *)(argbBuffer+(height-y-1)*width);
    for (x = 4*(width-1); x>=0; x-=4)
    {
        pARGB[x  ]   = pRGBA[x+2];
        pARGB[x+1]   = pRGBA[x+1];
        pARGB[x+2]   = pRGBA[x  ];
        pARGB[x+3]   = 0xFF;
    }
}

Please note that the more complex indices calculation is performed in the outer loop only. There are four acesses to both rgbaBuffer and argbBuffer for each pixel, but I think this is more than offset by avoiding the bitwise operations and the indixes calculations. An alternative would be (like in your code) fetch/store one pixel (int) at a time, and make the processing locally (this econimizes in memory accesses), but unless you have some efficient way to swap the two bytes and set the alpha locally (eg some inline assembly, so that you make sure that everything is performed at registers level), it won't really help.

查看更多
趁早两清
3楼-- · 2019-01-27 02:17

I will focus only in the swap function:

typedef unsigned int Color32;

inline Color32 Color32Reverse(Color32 x)
{

    return
    // Source is in format: 0xAARRGGBB
        ((x & 0xFF000000) >> 24) | //______AA
        ((x & 0x00FF0000) >>  8) | //____RR__
        ((x & 0x0000FF00) <<  8) | //__GG____
        ((x & 0x000000FF) << 24);  //BB______
    // Return value is in format:  0xBBGGRRAA
}
查看更多
混吃等死
4楼-- · 2019-01-27 02:18

Use assembly, the following is for Intel.

This example swaps Red and Blue.

void* b = pixels;
UINT len = textureWidth*textureHeight;

__asm                                                       
{
    mov ecx, len                // Set loop counter to pixels memory block size
    mov ebx, b                  // Set ebx to pixels pointer
    label:                      
        mov al,[ebx+0]          // Load Red to al
        mov ah,[ebx+2]          // Load Blue to ah
        mov [ebx+0],ah          // Swap Red
        mov [ebx+2],al          // Swap Blue
        add ebx,4               // Move by 4 bytes to next pixel
        dec ecx                 // Decrease loop counter
        jnz label               // If not zero jump to label
}
查看更多
爱情/是我丢掉的垃圾
5楼-- · 2019-01-27 02:22

Code you provided is very strange since it shuffles color components not rgba->argb, but rgba->rabg.

I've made a correct and optimized version of this routine.

int pixel;
int size = width * height;

for (unsigned int * rgba_ptr = rgbaBuffer, * argb_ptr = argbBuffer + size - 1; argb_ptr >= argbBuffer; rgba_ptr++, argb_ptr--)
{
    // *argb_ptr = *rgba_ptr >> 8 | 0xff000000;  // - this version doesn't change endianess
    *argb_ptr = __builtin_bswap32(*rgba_ptr) >> 8 | 0xff000000;  // This does
}

The first thing i've made is simplifying your shuffling expression. It is obvious that XRGB is just RGBA >> 8. Also i've removed calculation of array index on each iteration and used pointers as loop variables. This version is about 2 times faster than the original on my machine.

You can also use SSE for shuffling if this code is intended for x86 CPU.

查看更多
登录 后发表回答