How undefined is undefined behavior?

2019-01-03 16:28发布

I'm not sure I quite understand the extent to which undefined behavior can jeopardize a program.

Let's say I have this code:

#include <stdio.h>

int main()
{
    int v = 0;
    scanf("%d", &v);
    if (v != 0)
    {
        int *p;
        *p = v;  // Oops
    }
    return v;
}

Is the behavior of this program undefined for only those cases in which v is nonzero, or is it undefined even if v is zero?

8条回答
【Aperson】
2楼-- · 2019-01-03 17:03

When you declare variables (especially explicit pointers), a piece of memory is allocated (usually an int). This peace of memory is being marked as free to the system but the old value stored there is not cleared (this depends on the memory allocation being implemented by the compiler, it might fill the place with zeroes) so your int *p will have a random value (junk) which it has to interpret as integer. The result is the place in memory where p points to (p's pointee). When you try to dereference (aka. access this piece of the memory), it will be (almost every time) occupied by another process/program, so trying to alter/modify some others memory will result in access violation issues by the memory manager.

So in this example, any other value then 0 will result in undefined behavior, because no one knows what *p will point to at this moment.

I hope this explanation is of any help.

Edit: Ah, sorry, again few answers ahead of me :)

查看更多
欢心
3楼-- · 2019-01-03 17:09

Let me give an argument for why I think this is still undefined.

First, the responders saying this is "mostly defined" or somesuch, based on their experience with some compilers, are just wrong. A small modification of your example will serve to illustrate:

#include <stdio.h>

int
main()
{
    int v;
    scanf("%d", &v);
    if (v != 0)
    {
        printf("Hello\n");
        int *p;
        *p = v;  // Oops
    }
    return v;
}

What does this program do if you provide "1" as input? If you answer is "It prints Hello and then crashes", you are wrong. "Undefined behavior" does not mean the behavior of some specific statement is undefined; it means the behavior of the entire program is undefined. The compiler is allowed to assume that you do not engage in undefined behavior, so in this case, it may assume that v is non-zero and simply not emit any of the bracketed code at all, including the printf.

If you think this is unlikely, think again. GCC may not perform this analysis exactly, but it does perform very similar ones. My favorite example that actually illustrates the point for real:

int test(int x) { return x+1 > x; }

Try writing a little test program to print out INT_MAX, INT_MAX+1, and test(INT_MAX). (Be sure to enable optimization.) A typical implementation might show INT_MAX to be 2147483647, INT_MAX+1 to be -2147483648, and test(INT_MAX) to be 1.

In fact, GCC compiles this function to return a constant 1. Why? Because integer overflow is undefined behavior, therefore the compiler may assume you are not doing that, therefore x cannot equal INT_MAX, therefore x+1 is greater than x, therefore this function can return 1 unconditionally.

Undefined behavior can and does result in variables that are not equal to themselves, negative numbers that compare greater than positive numbers (see above example), and other bizarre behavior. The smarter the compiler, the more bizarre the behavior.

OK, I admit I cannot quote chapter and verse of the standard to answer the exact question you asked. But people who say "Yeah yeah, but in real life dereferencing NULL just gives a seg fault" are more wrong than they can possibly imagine, and they get more wrong with every compiler generation.

And in real life, if the code is dead you should remove it; if it is not dead, you must not invoke undefined behavior. So that is my answer to your question.

查看更多
等我变得足够好
4楼-- · 2019-01-03 17:10

If v is 0, your random pointer assignment never gets executed, and the function will return zero, so it is not undefined behaviour

查看更多
Deceive 欺骗
5楼-- · 2019-01-03 17:16

I'd say that the behavior is undefined only if the users inserts any number different from 0. After all, if the offending code section is not actually run the conditions for UB aren't met (i.e. the non-initialized pointer is not created neither dereferenced).

A hint of this can be found into the standard, at 3.4.3:

behavior, upon use of a nonportable or erroneous program construct or of erroneous data, for which this International Standard imposes no requirements

This seems to imply that, if such "erroneous data" was instead correct, the behavior would be perfectly defined - which seems pretty much applicable to our case.


Additional example: integer overflow. Any program that does an addition with user-provided data without doing extensive check on it is subject to this kind of undefined behavior - but an addition is UB only when the user provides such particular data.

查看更多
贼婆χ
6楼-- · 2019-01-03 17:17

Since this has the tag, I have an extremely nitpicking argument that the program's behavior is undefined regardless of user input, but not for the reasons you might expect -- though it can be well-defined (when v==0) depending on the implementation.

The program defines main as

int main()
{
    /* ... */
}

C99 5.1.2.2.1 says that the main function shall be defined either as

int main(void) { /* ... */ }

or as

int main(int argc, char *argv[]) { /* ... */ }

or equivalent; or in some other implementation-defined manner.

int main() is not equivalent to int main(void). The former, as a declaration, says that main takes a fixed but unspecified number and type of arguments; the latter says it takes no arguments. The difference is that a recursive call to main such as

main(42);

is a constraint violation if you use int main(void), but not if you use int main().

For example, these two programs:

int main() {
    if (0) main(42); /* not a constraint violation */
}


int main(void) {
    if (0) main(42); /* constraint violation, requires a diagnostic */
}

are not equivalent.

If the implementation documents that it accepts int main() as an extension, then this doesn't apply for that implementation.

This is an extremely nitpicking point (about which not everyone agrees), and is easily avoided by declaring int main(void) (which you should do anyway; all functions should have prototypes, not old-style declarations/definitions).

In practice, every compiler I've seen accepts int main() without complaint.

To answer the question that was intended:

Once that change is made, the program's behavior is well defined if v==0, and is undefined if v!=0. Yes, the definedness of the program's behavior depends on user input. There's nothing particularly unusual about that.

查看更多
一纸荒年 Trace。
7楼-- · 2019-01-03 17:21

It is simple. If a piece of code doesn't execute, it doesn't have a behavior!!!, whether defined or not.

If input is 0, then the code inside if doesn't run, so it depends on the rest of the program to determine whether the behavior is defined (in this case it is defined).

If input is not 0, you execute code that we all know is a case of undefined behavior.

查看更多
登录 后发表回答