I'm not sure the exact term for what I'm trying to do. I have an 8x8
block of bits
stored in 8 bytes
, each byte stores one row. When I'm finished, I'd like each byte to store one column.
For example, when I'm finished:
Byte0out = Byte0inBit0 + Byte1inBit0 + Byte2inBit0 + Byte3inBit0 + ...
Byte1out = Byte0inBit1 + Byte1inBit1 + Byte2inBit1 + Byte3inBit1 + ...
What is the easiest way to do this in C which performs well?
Lisp prototype:
This is how you can run the code:
Occasionally I disassemble code to check that there are no unnecessary calls to safety functions.
This is a benchmark. Run the function often enough to process a (binary) HDTV image.
That took only took 51ms. Note that I'm consing quite a lot because the function allocates new 8 byte arrays all the time. I'm sure an implementation in C can be tweaked a lot more.
Here are some more test cases:
Now I really want to see how my code compares to Andrejs Cainikovs' C solution (Edit: I think its wrong):
And benchmarking it:
Each loop over the HDTV image takes 2.5ms. That is quite a lot faster than my unoptimized Lisp.
Unfortunately the C code doesn't give the same results like my lisp: