我看明白了SSE2的功能多一点,并想知道,如果能做出128位宽的整数支持加,减,XOR和乘法?
Answer 1:
SSE2没有进位标志,但可以很容易地计算出进位作为carry = sum < a
或carry = sum < b
像这样 。 但更糟糕的是,SSE2没有64位的比较,所以你必须使用一些解决方法,如一个在这里
下面是基于上述想法的未经测试,未优化的C代码。
inline bool lessthan(__m128i a, __m128i b){
a = _mm_xor_si128(a, _mm_set1_epi32(0x80000000));
b = _mm_xor_si128(b, _mm_set1_epi32(0x80000000));
__m128i t = _mm_cmplt_epi32(a, b);
__m128i u = _mm_cmpgt_epi32(a, b);
__m128i z = _mm_or_si128(t, _mm_shuffle_epi32(t, 177));
z = _mm_andnot_si128(_mm_shuffle_epi32(u, 245),z);
return _mm_cvtsi128_si32(z) & 1;
}
inline __m128i addi128(__m128i a, __m128i b)
{
__m128i sum = _mm_add_epi64(a, b);
__m128i mask = _mm_set1_epi64(0x8000000000000000);
if (lessthan(_mm_xor_si128(mask, sum), _mm_xor_si128(mask, a)))
{
__m128i ONE = _mm_setr_epi64(0, 1);
sum = _mm_add_epi64(sum, ONE);
}
return sum;
}
正如你所看到的,代码需要更多的指令,甚至优化后,可能仍比x86_64的简单的2指令ADD / ADC(或4 86)更不再
文章来源: Is it possible to use SSE and SSE2 to make a 128-bit wide integer?