GCC couldn't vectorize 64-bit multiplication.

2019-05-04 19:16发布

问题:

I try to vectorize a CBRNG which uses 64bit widening multiplication.

static __inline__ uint64_t mulhilo64(uint64_t a, uint64_t b, uint64_t* hip) {
    __uint128_t product = ((__uint128_t)a)*((__uint128_t)b);
    *hip = product>>64;
    return (uint64_t)product;
}

Is such a multiplication exists in a vectorized form in AVX2?

回答1:

No. There's no 64 x 64 -> 128 bit arithmetic as a vector instruction. Nor is there a vector mulhi type instruction (high word result of multiply).

[V]PMULUDQ can do 32 x 32 -> 64 bit by only considering every second 32 bit unsigned element, or unsigned doubleword, as a source, and expanding each 64 bit result into two result elements combined as an unsigned quadword.

The best you can probably hope for right now is Haswell's MULX instruction, which has more flexible register use, and does not affect the flags register - eliminating some stalls.