I have an array of uint-types in C#, After checking if the program is working on a little-endian machine, I want to convert the data to a big-endian type. Because the amount of data can become very large but is always even, I was thinking to consider two uint types as an ulong type, for a better performance and program it in ASM, so I am searching for a very fast (the fastest if possible) Assembler-algorithm to convert little-endian in big-endian.
相关问题
- Sorting 3 numbers without branching [closed]
- Graphics.DrawImage() - Throws out of memory except
- Why am I getting UnauthorizedAccessException on th
- 求获取指定qq 资料的方法
- How to know full paths to DLL's from .csproj f
Well, that would also swap the two uint values, which might not be desirable...
You could try some C# code in unsafe mode, which may actually perform well enough. Like:
On my computer the throughput is around 2 GB per second.
You may want to simply rethink the problem, this should not be a bottleneck. Take the naive algorithm (written in CLI assembly, just for fun). lets assume the number we want is in local number 0
At most that's 13 (x86) assembly instructions per number (and most likely the interpreter will be even smarter by using clever registers). And it doesn't get more naive than that.
Now, compare that to the costs of
If 13 instructions per number is a significant chunk of your execution time, then you are doing a VERY high performance task and should have your input in the correct format! You also probably would not be using a managed language because you would want far more control over buffers of data and what-not, and no extra array bounds checking.
If that array of data comes across a network, I would expect there to be much greater costs from the managing of sockets than from a mere byte order flip, if its from disk, consider pre-flipping before executing this program.
For a large amount of data, the
bswap
instruction (available in Visual C++ under the_byteswap_ushort
,_byteswap_ulong
, and_byteswap_uint64
intrinsics) is the way to go. This will even outperform handwritten assembly. These are not available in pure C# without P/Invoke, so:PS: Many people are unaware of the byte swap intrinsics. Their performance is astonishing, doubly so for floating point data because it processes them as integers. There is no way to beat it without hand coding your register loads for every single byte swap use case, and should you try that, you'll probably incur a bigger hit in the optimizer than you'll ever pick up.