Fast way to swap endianness using opencl

I'm reading and writing lots of FITS and DNG images which may contain data of an endianness different from my platform and/or opencl device.

Currently I swap the byte order in the host's memory if necessary which is very slow and requires an extra step.

Is there a fast way to pass a buffer of int/float/short having wrong endianess to an opencl-kernel?

Using an extra kernel run just for fixing the endianess would be ok; using some overheadless auto-fixing-read/-write operation would be perfect.

I know about the variable attribute ((endian(host/device))) but this doesn't help with a big endian FITS file on a little endian platform using a little endian device.

I thought about a solution like this one (neither implemented nor tested, yet):

uint4 mask = (uint4) (3, 2, 1, 0);
uchar4 swappedEndianness = shuffle(originalEndianness, mask);
// to be applied on a float/int-buffer somehow

Hoping there's a better solution out there.

Thanks in advance, runtimeterror

标签： buffer byte opencl endianness

2条回答

疯言疯语

2楼-- · 2019-08-26 01:56

Most processor architectures perform best when using instructions to complete the operation which can fit its register width, for example 32/64-bit width. When CPU/GPU performs such byte-wise operators, using subscripts .wxyz for uchar4, they needs to use a mask to retrieve each byte from the integer, shift the byte, and then using integer add or or operator to the result. For the endianness swaping, the processor needs to perform above integer and, shift, add/or for 4 times because there are 4 bytes.

The most efficient way is as follows

#define EndianSwap(n) (rotate(n & 0x00FF00FF, 24U)|(rotate(n, 8U) & 0x00FF00FF)

n could be in any gentype, for example, an uint4 variable. Because OpenCL does not allow C++ type overloading, so the best choice is macro.

0人赞添加讨论(0) 举报

小情绪 Triste *

3楼-- · 2019-08-26 02:05

Sure. Since you have a uchar4 - you can simply swizzle the components and write them back.

output[tid] = input[tid].wzyx;

swizzling is very also performant on SIMD architectures with very little cost, so you should be able to combine it with other operations in your kernel.

Hope this helps!

0人赞添加讨论(0) 举报

Fast way to swap endianness using opencl

采纳回答

编辑标签

举报内容

检举类型

检举原因

检举说明(必填)

打开微信“扫一扫”，打开网页后点击屏幕右上角分享按钮

付费偷看金额在0.1-10元之间