I want to multiply a 8x8 binary matrix represented as a unsigned 64 bit integer by a 8 bit vector represented by a unsigned char. However, due to some other issues the matrix must be ordered by columns, ergo there's no easy matching of bytes for easy multiplication.
Any idea how to speed up such a calculation? Every operation counts for I need billions of such calculations made.
The multiplications are made over a 2 element field (F-2).
With this matrix and vector representation, it helps to do matrix multiplication this way:
(col1 ... col8) * (v1 ... v8)T = col1 * v1 + ... + col8 * v8
where matrix A = (col1 ... col8)
and column vector v = (v1 ... v8)T
Thinking this further, you can do all multiplications at once if you inflate the 8-bit vector to a 64-bit vector by repeating every bit 8 times and then calculating P = A & v_inflated
. The only thing left then, is the addition (i.e. XOR) of the products.
A simple approach for XORing the products is.
uint64_t P = calculated products from text above;
uint64_t sum = 0;
for( int i = 8; i; --i )
{
sum ^= P & 0xFF;
P >> 8;
}
You ONLY HAVE 256 vectors! Use lookup tables to generate the right bitmasks, then your logic will be something like
output_bit_n = bool (matrix [n] & lookup [vector])
In other words, your lookup table can transpose an 8-bit value into the 64-bit world.
You can efficiently pack this into the result with rotate-with-carry instructions if the compiler isn't smart enough to optimise (value<<=1)|=result
.