Compiling this:
#include <iostream>
int main()
{
for (int i = 0; i < 4; ++i)
std::cout << i*1000000000 << std::endl;
}
and gcc
produces the following warning:
warning: iteration 3u invokes undefined behavior [-Waggressive-loop-optimizations]
std::cout << i*1000000000 << std::endl;
^
I understand there is a signed integer overflow.
What I cannot get is why i
value is broken by that overflow operation?
I've read the answers to Why does integer overflow on x86 with GCC cause an infinite loop?, but I'm still not clear on why this happens - I get that "undefined" means "anything can happen", but what's the underlying cause of this specific behavior?
Online: http://ideone.com/dMrRKR
Compiler: gcc (4.8)
tl;dr The code generates a test that integer + positive integer == negative integer. Usually the optimizer does not optimize this out, but in the specific case of
std::endl
being used next, the compiler does optimize this test out. I haven't figured out what's special aboutendl
yet.From the assembly code at -O1 and higher levels, it is clear that gcc refactors the loop to:
The biggest value that works correctly is
715827882
, i.e. floor(INT_MAX/3
). The assembly snippet at-O1
is:Note, the
-1431655768
is4 * 715827882
in 2's complement.Hitting
-O2
optimizes that to the following:So the optimization that has been made is merely that the
addl
was moved higher up.If we recompile with
715827883
instead then the -O1 version is identical apart from the changed number and test value. However, -O2 then makes a change:Where there was
cmpl $-1431655764, %esi
at-O1
, that line has been removed for-O2
. The optimizer must have decided that adding715827883
to%esi
can never equal-1431655764
.This is pretty puzzling. Adding that to
INT_MIN+1
does generate the expected result, so the optimizer must have decided that%esi
can never beINT_MIN+1
and I'm not sure why it would decide that.In the working example it seems it'd be equally valid to conclude that adding
715827882
to a number cannot equalINT_MIN + 715827882 - 2
! (this is only possible if wraparound does actually occur), yet it does not optimize the line out in that example.The code I was using is:
If the
std::endl(std::cout)
is removed then the optimization no longer occurs. In fact replacing it withstd::cout.put('\n'); std::flush(std::cout);
also causes the optimization to not happen, even thoughstd::endl
is inlined.The inlining of
std::endl
seems to affect the earlier part of the loop structure (which I don't quite understand what it is doing but I'll post it here in case someone else does):With original code and
-O2
:With mymanual inlining of
std::endl
,-O2
:One difference between these two is that
%esi
is used in the original , and%ebx
in the second version; is there any difference in semantics defined between%esi
and%ebx
in general? (I don't know much about x86 assembly).It seems that integer overflow occurs in 4th iteration (for
i = 3
).signed
integer overflow invokes undefined behavior. In this case nothing can be predicted. The loop may iterate only4
times or it may go to infinite or anything else!Result may vary compiler to compiler or even for different versions of same compiler.
C11: 1.3.24 undefined behavior:
Short answer,
gcc
specifically has documented this problem, we can see that in the gcc 4.8 release notes which says (emphasis mine going forward):and indeed if we use
-fno-aggressive-loop-optimizations
the infinite loop behavior should cease and it does in all the cases I have tested.The long answer starts with knowing that signed integer overflow is undefined behavior by looking at the draft C++ standard section
5
Expressions paragraph 4 which says:We know that the standard says undefined behavior is unpredictable from the note that come with the definition which says:
But what in the world can the
gcc
optimizer be doing to turn this into an infinite loop? It sounds completely wacky. But thankfullygcc
gives us a clue to figuring it out in the warning:The clue is the
Waggressive-loop-optimizations
, what does that mean? Fortunately for us this is not the first time this optimization has broken code in this way and we are lucky because John Regehr has documented a case in the article GCC pre-4.8 Breaks Broken SPEC 2006 Benchmarks which shows the following code:the article says:
and later on says:
So what the compiler must be doing in some cases is assuming since signed integer overflow is undefined behavior then
i
must always be less than4
and thus we have an infinite loop.He explains this is very similar to the infamous Linux kernel null pointer check removal where in seeing this code:
gcc
inferred that sinces
was deferenced ins->f;
and since dereferencing a null pointer is undefined behavior thens
must not be null and therefore optimizes away theif (!s)
check on the next line.The lesson here is that modern optimizers are very aggressive about exploiting undefined behavior and most likely will only get more aggressive. Clearly with just a few examples we can see the optimizer does things that seem completely unreasonable to a programmer but in retrospect from the optimizers perspective make sense.
Signed integer overflow (as strictly speaking, there is no such thing as "unsigned integer overflow") means undefined behaviour. And this means anything can happen, and discussing why does it happen under the rules of C++ doesn't make sense.
C++11 draft N3337: §5.4:1
Your code compiled with
g++ -O3
emits warning (even without-Wall
)The only way we can analyze what the program is doing, is by reading the generated assembly code.
Here is the full assembly listing:
I can barely even read assembly, but even I can see the
addl $1000000000, %edi
line. The resulting code looks more likeThis comment of @T.C.:
gave me idea to compare the assembly code of the OP's code to the assembly code of the following code, with no undefined behaviour.
And, in fact, the correct code has termination condition.
Deal with it, you wrote the buggy code and you should feel bad. Bear the consequences.
...or, alternatively, make proper use of better diagnostics and better debugging tools - that's what they are for:
enable all warnings
-Wall
is the gcc option that enables all useful warnings with no false positives. This is a bare minimum that you should always use.-Wall
as they may warn on false positivesuse debug flags for debugging
-ftrapv
traps the program on overflow,-fcatch-undefined-behavior
catches a lot of instances of undefined behaviour (note:"a lot of" != "all of them"
)Use gcc's
-fwrapv
1 - this rule does not apply to "unsigned integer overflow", as §3.9.1.4 says that
and e.g. result of
UINT_MAX + 1
is mathematically defined - by the rules of arithmetic modulo 2nAnother example of this error being reported in gcc is when you have a loop that executes for a constant number of iterations, but you are using the counter variable as an index into an array that has less than that number of items, such as:
The compiler can determine that this loop will try to access memory outside of the array 'a'. The compiler complains about this with this rather cryptic message: