可以将文章内容翻译成中文,广告屏蔽插件可能会导致该功能失效(如失效,请关闭广告屏蔽插件后再试):
问题:
I have a C function:
int64_t fn(int64_t a, int32_t b, int32_t c, int32_t d)
{
/* should return (a * b * c)/d */
}
It is possible for a to be near INT64_MAX, but for the final result not to overflow, for instance if b = 1, c = d = 40. However, I am having trouble figuring out how to compute this so that I never lose data to rounding (by doing the division first) or have an intermediate result overflow.
If I had access to a large enough datatype to fit the whole product of a, b, and c, I would just do the math in that type and then truncate, but is there some way I can do this without big integers?
回答1:
Write a = q*d + r
with |r| < |d|
(I'm assuming d != 0
, otherwise the computation is meaningless anyway). Then (a*b*c)/d = q*b*c + (r*b*c)/d
. If q*b*c
overflows, the entire computation would overflow anyway, so either you don't care, or you have to check for overflow. r*b*c
might still overflow, so we again use the same method to avoid overflow,
int64_t q = a/d, r = a%d;
int64_t part1 = q*b*c;
int64_t q1 = (r*b)/d, r1 = (r*b)%d;
return part1 + q1*c + (r1*c)/d;
回答2:
I would suggest finding the greatest common divisor of d with each of a, b, and c,
dividing out the common factors as you go:
common = gcd(a,d) // You can implement GCD using Euclid's algorithm
a=a/common
d=d/common
common = gcd(b,d)
b=b/common
d=d/common
common = gcd(c,d)
c=c/common
d=d/common
Then calculate a*b*c/d
with all the common factors removed. Euclid's GCD algorithm
runs in logarithmic time, so this should be fairly efficient.
回答3:
It's easy to see that some inputs would produce outputs that cannot be represented by the return type int64_t
. For example, fn(INT64_MAX, 2, 1, 1)
. However, the following approach will should allow you to return a correct answer for any combination of inputs that does in fact fit within the range of int64_t
.
int64_t fn(int64_t a, int32_t b, int32_t c, int32_t d)
{
/* find the integer and remainder portions of a/d */
int64_t leftI = a / d;
int64_t leftR = a % d;
/* multiply the integer portion of the result by b and c */
int64_t resultI = leftI * b * c;
/* multiply the remainder portion by b */
int64_t resultR = leftR * b;
resultI = resultI + (resultR / d) * c;
/* multiply the remainder portion by c */
resultR = (resultR % d) * c;
return resultI + (resultR / d);
}
回答4:
If you are working on x86_64 then the asm supports 128 bit integers:
int64_t fn(uint64_t a, uint64_t b, uint64_t c, uint64_t d) {
asm (
"mulq %1\n" // a *= b
"movq %%rbx, %%rdx\n"// rbx = upper 64 bit of the multiplication
"mulq %2\n" // multiply the lower 64 bits by c
"push %%rax\n" // temporarily save the lowest 64 bits on the stack
"mov %%rcx, %%rdx\n" // rcx = upper 64 bits of the multiplication
"movq %%rax, %%rbx\n"//
"mulq %2\n" // multiply the upper 64 bits by c
"addq %%rax, %%rcx\n"// combine the middle 64 bits
"addcq %%rdx, $0\n" // transfer carry tp the higest 64 bits if present
"divq %3\n" // divide the upper 128 (of 192) bits by d
"mov %%rbx, %%rax\n" // rbx = result
"pop %%rax\n"
"divq %3\n" // divide remainder:lower 64 bits by d
: "+a" (a) // assigns a to rax register as in/out
, "+b" (b) // assigns b to rbx register
: "g" (c) // assigns c to random register
, "g" (d) // assigns d to random register
: "edx", "rdx" // tells the compiler that edx/rdx will be used internally, but does not need any input
);
// b now holds the upper 64 bit if (a * b * c / d) > UINT64_MAX
return a;
}
Please note that all input integers have to be the same length. Working length will be double the input. Works with unsigned only.
The native div
and mul
instructions on x86 work on double-length exactly to allow for overflows. Sadly I am unaware of a compiler intrinsic to make use of them.