In x86_64 I know that the mul and div opp codes support 128 integers by putting the lower 64 bits in the rax and the upper in the rdx registers. I was looking for some sort of intrinsic to do this in the intel intrinsics guide and I could not find one. I am writing a big number library where the word size is 64 bits. Right now I am doing division by a single word like this.
int ubi_div_i64(ubigint_t* a, ubi_i64_t b, ubi_i64_t* rem)
{
if(b == 0)
return UBI_MATH_ERR;
ubi_i64_t r = 0;
for(size_t i = a->used; i-- > 0;)
{
ubi_i64_t out;
__asm__("\t"
"div %[d] \n\t"
: "=a"(out), "=d"(r)
: "a"(a->data[i]), "d"(r), [d]"r"(b)
: "cc");
a->data[i] = out;
//ubi_i128_t top = (r << 64) + a->data[i];
//r = top % b;
//a->data[i] = top / b;
}
if(rem)
*rem = r;
return ubi_strip_leading_zeros(a);
}
It would be nice if I could use something in the x86intrinsics.h header instead of inline asm.
gcc has __int128
and __uint128
types.
Arithmetic with them should be using the right assembly instructions when they exist; I've used them in the past to get the upper 64 bits of a product, although I've never used it for division. If it's not using the right ones, submit a bug report / feature request as appropriate.
Last I looked into it the intrinsic were in a state of flux. The main reason for the intrinsics in this case appears to be due to the fact that MSVC in 64-bit mode does not allow inline assembly.
With MSVC (and I think ICC) you can use _umul128
for mul
and _mulx_u64
for mulx
. These don't work in GCC , at least not GCC 4.9 (_umul128
is much older than GCC 4.9). I don't know if GCC plans to support these since you can get mul
and mulx
indirectly through __int128
(depending on your compile options) or directly through inline assembly.
__int128
works fine until you need a larger type and a 128-bit carry. Then you need adc
, adcx
, or adox
and these are even more of a problem with intrinsics. Intel's documentation disagree's with MSVC and the compilers don't seem to produce adox
yet with these intrinsics. See this question: _addcarry_u64 and _addcarryx_u64 with MSVC and ICC.
Inline assembly is probably the best solution with GCC (and probably even ICC).