GCC couldn't vectorize 64-bit multiplication.

2019-05-04 19:16发布

站内文章 / C++

42 0

做自己的国王

女 | 书童

私信

可以将文章内容翻译成中文,广告屏蔽插件可能会导致该功能失效(如失效，请关闭广告屏蔽插件后再试):

问题:

I try to vectorize a CBRNG which uses 64bit widening multiplication.

static __inline__ uint64_t mulhilo64(uint64_t a, uint64_t b, uint64_t* hip) {
    __uint128_t product = ((__uint128_t)a)*((__uint128_t)b);
    *hip = product>>64;
    return (uint64_t)product;
}

Is such a multiplication exists in a vectorized form in AVX2?

回答1:

No. There's no 64 x 64 -> 128 bit arithmetic as a vector instruction. Nor is there a vector mulhi type instruction (high word result of multiply).

[V]PMULUDQ can do 32 x 32 -> 64 bit by only considering every second 32 bit unsigned element, or unsigned doubleword, as a source, and expanding each 64 bit result into two result elements combined as an unsigned quadword.

The best you can probably hope for right now is Haswell's MULX instruction, which has more flexible register use, and does not affect the flags register - eliminating some stalls.