Yeah thanks that works. @PeterCordes. Also __int128
works. But one more thing as you said using the intrinsics of multiprecision arithmetic that is _addcarry_u64
in C, using the header file immintrin.h
I have the following code
#include <stdio.h>
#include <stdlib.h>
#include <stdint.h>
#include <immintrin.h>
unsigned char _addcarry_u64(unsigned char c_in, uint64_t src1, uint64_t src2,uint64_t *sum);
int main()
{
unsigned char carry;
uint64_t sum;
long long int c1=0,c2=0;
uint64_t a=0x0234BDFA12CD4379,b=0xA8DB4567ACE92B38;
carry = _addcarry_u64(0,a,b,&sum);
printf("sum is %lx and carry value is %u n",sum,carry);
return 0;
}
Can you please point me out the error? I'm getting undefined reference to _addcarry_u64
. Some quick google doesn't answer the problem , if any other header file to be used or it is not compatible with gcc and why so
Initially I had this code for adding two 64 bit numbers:
static __inline int is_digit_lessthan_ct(digit_t x, digit_t y)
{ // Is x < y?
return ( int)((x ^ ((x ^ y) | ((x - y) ^ y))) >> (RADIX-1));
}
#define ADDC(carryIn, addend1, addend2, carryOut, sumOut) \
{ digit_t tempReg = (addend1) + (int)(carryIn); \
(sumOut) = (addend2) + tempReg; \
(carryOut) = (is_digit_lessthan_ct(tempReg, (int)(carryIn)) | is_digit_lessthan_ct((sumOut), tempReg)); \
}
Now I got to know that the speed of this implementation can be improved using assembly language. So I am trying to do something similar however I cannot access or return the carry. Here is my code :
#include<stdio.h>
#include<stdlib.h>
#include<stdint.h>
uint64_t add32(uint64_t a,uint64_t b)
{
uint64_t d=0,carry=0;
__asm__("mov %1,%%rax\n\t"
"adc %2,%%rax\n\t"
"mov %%rax,%0\n\t"
:"=r"(d)
:"r"(a),"r"(b)
:"%rax"
);
return d;
}
int main()
{
uint64_t a=0xA234BDFA12CD4379,b=0xA8DB4567ACE92B38;
printf("Sum = %lx \n",add32(a,b));
return 0;
}
The result of this addition should be 14B100361BFB66EB1, where the initial 1 in msb is the carry. I want to save that carry in another register. I tried jc, but I'm getting some or the other error. Even setc gave me error, may be because I'm not sure of the syntax. So can anyone tell me how to save the carry in another register or return it by modifying this code?
As usual, inline asm is not strictly necessary. https://gcc.gnu.org/wiki/DontUseInlineAsm. But currently compilers kinda suck for actual extended-precision addition, so you might want asm for this.
There's an Intel intrinsic for
adc
:_addcarry_u64
. But gcc and clang may make slow code., unfortunately. In GNU C on a 64-bit platform, you could just useunsigned __int128
.Compilers usually manage to make pretty good code when checking for carry-out from addition using the idiom that
carry_out = (x+y) < x
, where<
is an unsigned compare. For example:gcc7.2 -O3 emits this (and clang emits similar code):
There's no way you can do better than this with inline asm; this function already looks optimal for modern CPUs. (Well I guess if
mov
wasn't zero latency, doing theadd
first would shorten the latency to carry being ready. But on Intel CPUs, it's supposed to be better to overwrite mov-elimination results right away, so it's better to mov first and then add.)Clang will even use
adc
to use the carry-out from an add as the carry-in to another add, but only for the first limb. Perhaps because: Update: this function is broken:carry_out = (x+y) < x
doesn't work when there's carry-in. Withcarry_out = (x+y+c_in) < x
,y+c_in
can wrap to zero and give you(x+0) < x
(false) even though there was carry.Notice that clang's
cmp
/adc reg,0
exactly implements the behaviour of the C, which isn't the same as anotheradc
there.Anyway, gcc doesn't even use
adc
the first time, when it is safe. (So useunsigned __int128
for code that doesn't suck, and asm for integers even wider than that).This sucks, so if you want extended-precision addition, you still need asm. But for getting the carry-out into a C variable, you don't.