可以将文章内容翻译成中文,广告屏蔽插件可能会导致该功能失效(如失效，请关闭广告屏蔽插件后再试):

问题:

I want to multiply a 8x8 binary matrix represented as a unsigned 64 bit integer by a 8 bit vector represented by a unsigned char. However, due to some other issues the matrix must be ordered by columns, ergo there's no easy matching of bytes for easy multiplication.

Any idea how to speed up such a calculation? Every operation counts for I need billions of such calculations made.

The multiplications are made over a 2 element field (F-2).

回答1:

With this matrix and vector representation, it helps to do matrix multiplication this way:

(col₁ ... col₈) * (v₁ ... v₈)^T = col₁ * v₁ + ... + col₈ * v₈

where matrix A = (col₁ ... col₈)

and column vector v = (v₁ ... v₈)^T

Thinking this further, you can do all multiplications at once if you inflate the 8-bit vector to a 64-bit vector by repeating every bit 8 times and then calculating P = A & v_inflated. The only thing left then, is the addition (i.e. XOR) of the products.

A simple approach for XORing the products is.

uint64_t P = calculated products from text above;
uint64_t sum = 0;
for( int i = 8; i; --i )
{
   sum ^= P & 0xFF;
   P >> 8;  
}

回答2:

You ONLY HAVE 256 vectors! Use lookup tables to generate the right bitmasks, then your logic will be something like

output_bit_n = bool (matrix [n] & lookup [vector])

In other words, your lookup table can transpose an 8-bit value into the 64-bit world.

You can efficiently pack this into the result with rotate-with-carry instructions if the compiler isn't smart enough to optimise (value<<=1)|=result.