Say I have the following C code:
int32_t foo(int32_t x) {
return x + 1;
}
This is undefined behavior when x == INT_MAX
. Now say I performed the addition with inline assembly instead:
int32_t foo(int32_t x) {
asm("incl %0" : "+g"(x));
return x;
}
Question: Does the inline assembly version still invoke undefined behavior when x == INT_MAX
? Or does undefined behavior only apply to the C code?
No, there's no UB with this. C rules don't apply to the asm instructions themselves. As far as the inline-asm syntax wrapping the instructions, that's a well-defined language extension that has defined behaviour on implementations that support it.
See Does undefined behavior apply to asm code? for a more generic version of this question (vs. this one about x86 assembly and the GNU C inline asm language extension). The answers there focus on the C side of things, with quotes from the C and C++ standards that document how little the standard has to say about implementation-defined extensions to the language.
See also this comp.lang.c thread for arguments about whether it makes sense to say it has UB "in general" because not all implementations have that extension.
BTW, if you just want signed wraparound with defined 2's complement behaviour in GNU C, compile with
-fwrapv
. Don't use inline asm. (Or use an__attribute__
to enable that option for just the function that needs it.)wrapv
is not quite the same thing as-fno-strict-overflow
, which merely disables optimizations based on assuming the program doesn't have any UB; for example, overflow in compile-time-constant calculations is only safe with-fwrapv
.Inline-asm behaviour is implementation defined, and GNU C inline asm is defined as a black box for the compiler. Inputs go in, outputs come out, and the compiler doesn't know how. All it knows is what you tell it using the out/in/clobber constraints.
Your
foo
that uses inline-asm behaves identically toon x86, because x86 is a 2's complement machine, so integer wraparound is well-defined. (Except for performance: the asm version defeats constant propagation, and also gives the compiler no ability to optimize
x - inc(x)
to -1, etc. etc. https://gcc.gnu.org/wiki/DontUseInlineAsm unless there's no way to coax the compiler into generating optimal asm by tweaking the C.)It doesn't raise exceptions. Setting the OF flag has no impact on anything, because GNU C inline asm for x86 (i386 and amd64) has an implicit
"cc"
clobber, so the compiler will assume that the condition codes in EFLAGS hold garbage after every inline-asm statement. gcc6 introduced a new syntax for asm to produce flag results (which can save a SETCC in your asm and a TEST generated by the compiler for asm blocks that want to return a flag condition).Some architectures do raise exceptions (traps) on integer overflow, but x86 is not one of them (except when a division quotient doesn't fit in the destination register). On MIPS, you'd use ADDIU instead of ADDI on signed integers if you wanted them to be able to wrap without trapping. (Because it's also a 2's complement ISA, so signed wraparound is the same in binary as unsigned wraparound.)
Undefined (or at least implementation-dependent) Behaviour in x86 asm:
BSF and BSR (find first set bit forward or reverse) leave their destination register with undefined contents if the input was zero. (TZCNT and LZCNT don't have that problem). Intel's recent x86 CPUs do define the behaviour, which is to leave the destination unmodified, but the x86 manuals don't guarantee that. See the section on TZCNT in this answer for more discussion on the implications, e.g. that TZCNT/LZCNT/POPCNT have a false dependency on the output in Intel CPUs.
Several other instructions leave some flags undefined in some/all cases. (especially AF/PF). IMUL for example leaves ZF, PF, and AF undefined.
Presumably any given CPU has consistent behaviour, but the point is that other CPUs might behave differently even though they're still x86. If you're Microsoft, Intel will design their future CPUs to not break your existing code. If your code is that widely-relied-on, you'd better stick to only relying on behaviour documented in the manuals, not just what your CPU happens to do. See Andy Glew's answer and comments here. Andy was one of the architects of Intel's P6 microarchitecture.
These examples are not the same thing as UB in C. They're more like what C would call "implementation defined", since we're just talking about one value that's unspecified, not the possibility of nasal demons. (Or the more plausible modifying other registers, or jumping somewhere).
For really undefined behaviour, you probably need to look at privileged instructions, or at least multi-threaded code. Self-modifying code is also potentially UB on x86: it's not guaranteed that the CPU "notices" stores to addresses that are about to be executed until after a jump instruction. This was the subject of the question linked above (and the answer is: real implementations of x86 go above and beyond what the x86 ISA manual requires, to support code that depends on it, and because snooping all the time is better for high-performance than flushing on jumps.)
Undefined behaviour in assembly language is pretty rare, especially if you don't count cases where a specific value is unspecified but the scope of the "damage" is predictable and limited.
Well, the C Standard doesn't define what inline assembler does, so any inline assembler is undefined behaviour according to the C Standard.
You are using a slightly different language "C with x86 32 bit inline assembler". You generated a valid assembler statement. The behaviour is presumably defined by Intel's reference manuals. And there the behaviour of an integer addition adding 1 to INT_MAX is well defined. It's defined in a way that it doesn't interfere with execution of your C program.
Inline assembler that tried to read a value via a null pointer would also be well defined on the assembler level, but it's behaviour would interfere with the execution of your program (a.k.a. crashing it).