On Undefined Behavior

2019-02-08 18:00发布

Generally, UB is regarded as being something that has to be avoided, and the current C standard itself lists quite a few examples in appendix J.

However, there are cases where I can see no harm in exploiting UB other than sacrificing portability.

Consider the following definition:

int a = INT_MAX + 1;

Evaluating this expression leads to UB. However, if my program is intended to run on a, say, 32-bit CPU with modular arithmetic representing values in Two's Complement, I'm inclined to believe that I can predict the outcome.

In my opinion, UB is sometimes just the C standard's way of telling me: "I hope you know what you're doing, because we can't make any guarantees on what will happen."

Hence my question: is it safe to sometimes rely on machine-dependent behavior, even if the C standard considers it to invoke UB, or is "UB" really to be avoided, no matter what the circumstances are?

7条回答
一夜七次
2楼-- · 2019-02-08 18:12

In general, it's better to completely avoid it. On the other hand, if your compiler documentation explicitly states that that specific thing that is UB for the standard is instead defined for that compiler, you may exploit it, possibly adding some #ifdef/#error machinery to block the compilation in case another compiler is used.

查看更多
做自己的国王
3楼-- · 2019-02-08 18:14

If a C (or other language) standard declares that some particular code will have Undefined Behavior in some situation, that means that a C compiler can generate code to do whatever it wants in that situation, while remaining compliant with that standard. Many particular language implementations have documented behaviors which go beyond what is required by the generic language standard. For example, Whizbang Compilers Inc. might explicitly specify that its particular implementation of memcpy will always copy individual bytes in address order. On such a compiler, code like:

  unsigned char z[256];
  z[0] = 0x53;
  z[1] = 0x4F;
  memcpy(z+2, z, 254);

would have behavior which was defined by the Whizbang documentation, even though the behavior of such code is not specified by any non-vendor-specific C language specification. Such code would be compatible with compilers that comply with Whizbang's spec, but could be incompatible with other compilers which comply with various C standards but do not comply with Whizbang's specifications.

There are many situations, especially with embedded systems, where programs will need to do some things which the C standards do not require compilers to allow. It is not possible to write such programs to be compatible with all standards-compliant compilers, since some standards-compliant compilers may not provide any way to do what needs to be done, and even those that do might require different syntax. Nonetheless, there is often considerable value in writing code that will be run correctly by any standards-compliant compiler.

查看更多
混吃等死
4楼-- · 2019-02-08 18:18

If the standard says that doing something is undefined, then it is undefined. You may like to think you can predict what the outcome will be, but you can't. For a specific compiler you may always get the same result, but for the next iteration of the compiler, you may not.

And undefined behaviour is so EASY to avoid - don't write code like that! So why do people like you want to mess with it?

查看更多
相关推荐>>
5楼-- · 2019-02-08 18:30

No, unless you're also keeping your compiler the same and your compiler documentation defines the otherwise undefined behavior.

Undefined behavior means that your compiler can ignore your code for any reason, making things true that you don't think should be.
Sometimes this is for optimization, and sometimes it's because of architecture restrictions like this.


I suggest you read this, which addresses your exact example. An excerpt:

Signed integer overflow:

If arithmetic on an int type (for example) overflows, the result is undefined. One example is that INT_MAX + 1 is not guaranteed to be INT_MIN. This behavior enables certain classes of optimizations that are important for some code.

For example, knowing that INT_MAX + 1 is undefined allows optimizing X + 1 > X to true. Knowing the multiplication "cannot" overflow (because doing so would be undefined) allows optimizing X * 2 / 2 to X. While these may seem trivial, these sorts of things are commonly exposed by inlining and macro expansion. A more important optimization that this allows is for <= loops like this:

for (i = 0; i <= N; ++i) { ... }

In this loop, the compiler can assume that the loop will iterate exactly N + 1 times if i is undefined on overflow, which allows a broad range of loop optimizations to kick in. On the other hand, if the variable is defined to wrap around on overflow, then the compiler must assume that the loop is possibly infinite (which happens if N is INT_MAX) - which then disables these important loop optimizations. This particularly affects 64-bit platforms since so much code uses int as induction variables.

查看更多
虎瘦雄心在
6楼-- · 2019-02-08 18:30

If you know for a fact that your code will only be targeting a specific architecture, compiler, and OS, and you know how the undefined behavior works (and that that won't change), then it's not inherently wrong to use it occasionally. In your example, I think I can tell what's going to happen as well.

However, UB is rarely a preferred solution. If there's a cleaner way, use it. Using undefined behavior should really never be absolutely necessary, but it might be convenient in a few cases. Never rely on it. And as always, comment your code if you ever rely on UB.

And please, don't ever publish code that relies on undefined behavior, because it'll just end up blowing up in someone's face when they compile it on a system with a different implementation than the one that you relied on.

查看更多
Summer. ? 凉城
7楼-- · 2019-02-08 18:32

No! Just because it compiles, runs and gives the output you hoped for does not make it correct.

查看更多
登录 后发表回答