I just started to use SS2 optimization of image processing, but for the 3 channel 24 bit color images have no idea.
My pix data arranged by BGR BGR BGR ... ,unsigned char 8-bi, so if I want to implement the Color2Gray with SSE2/SSE3/SSE4's instruction C/C++ fun ,how would I do? Does need to align(4/8/16) for my pix data?
I have read article:http://supercomputingblog.com/windows/image-processing-with-sse/
But it is ARGB 4 channel 32-bit color,exactly process 4 color pix data every time.
Thanks!
//Assume the original pixel:
unsigned char* pDataColor=(unsigned char*)malloc(src.width*src.height*3);//3
//init pDataColor every pix val
// The dst pixel:
unsigned char* pDataGray=(unsigned char*)malloc(src.width*src.height*1);//1
//RGB->Gray: Y=0.212671*R + 0.715160*G + 0.072169*B
I have slides on de-interleaving of 24-bit RGB pixels, which explain how to do it with SSE2 and SSSE3.
Here is some answers to your question:
- For How to use SSE2 instruction C/C++ functions. These references may be helpful.
- Optimization of Image Processing Algorithms: A Case Study
- Speeding up some SSE2 Intrinsics for color conversion
- SSE intrinsic functions reference
- For the alignment: Yes, 16-byte align is necessary. When there are memory accesses using SSE2 intrinsic functions( The SSE2/SSE3/SSE4 instruction C/C++ functions), you should make sure that the memory address is 16-byte alignment. If you're using MSVC, you'll have to use declspec(align(16)), or with GCC, this would be __attribute((aligned (16))).
- The reason why align is necessary can be found here: Why does instruction/data alignment exist?
- For 3-channel RGB conversion, I am not an image-processing experts, so can not give advice. There are also some open source image processing libraries that may already contain the code you want.