可以将文章内容翻译成中文,广告屏蔽插件可能会导致该功能失效(如失效，请关闭广告屏蔽插件后再试):

问题:

I just started to use SS2 optimization of image processing, but for the 3 channel 24 bit color images have no idea. My pix data arranged by BGR BGR BGR ... ,unsigned char 8-bi, so if I want to implement the Color2Gray with SSE2/SSE3/SSE4's instruction C/C++ fun ,how would I do? Does need to align(4/8/16) for my pix data? I have read article:http://supercomputingblog.com/windows/image-processing-with-sse/ But it is ARGB 4 channel 32-bit color,exactly process 4 color pix data every time. Thanks!

//Assume the original pixel:
      unsigned char* pDataColor=(unsigned char*)malloc(src.width*src.height*3);//3

  //init pDataColor every pix val
  // The dst pixel:
  unsigned char* pDataGray=(unsigned char*)malloc(src.width*src.height*1);//1

//RGB->Gray: Y=0.212671*R + 0.715160*G + 0.072169*B

回答1:

I have slides on de-interleaving of 24-bit RGB pixels, which explain how to do it with SSE2 and SSSE3.

回答2:

Here is some answers to your question:

For How to use SSE2 instruction C/C++ functions. These references may be helpful.
- Optimization of Image Processing Algorithms: A Case Study
- Speeding up some SSE2 Intrinsics for color conversion
- SSE intrinsic functions reference
For the alignment: Yes, 16-byte align is necessary. When there are memory accesses using SSE2 intrinsic functions（ The SSE2/SSE3/SSE4 instruction C／C++ functions), you should make sure that the memory address is 16-byte alignment. If you're using MSVC, you'll have to use declspec(align(16)), or with GCC, this would be __attribute((aligned (16))).
- The reason why align is necessary can be found here: Why does instruction/data alignment exist?
For 3-channel RGB conversion, I am not an image-processing experts, so can not give advice. There are also some open source image processing libraries that may already contain the code you want.