I'm not sure I quite understand the extent to which undefined behavior can jeopardize a program.
Let's say I have this code:
#include <stdio.h>
int main()
{
int v = 0;
scanf("%d", &v);
if (v != 0)
{
int *p;
*p = v; // Oops
}
return v;
}
Is the behavior of this program undefined for only those cases in which v
is nonzero, or is it undefined even if v
is zero?
When you declare variables (especially explicit pointers), a piece of memory is allocated (usually an int). This peace of memory is being marked as
free
to the system but the old value stored there is not cleared (this depends on the memory allocation being implemented by the compiler, it might fill the place with zeroes) so yourint *p
will have a random value (junk) which it has to interpret asinteger
. The result is the place in memory wherep
points to (p's pointee). When you try todereference
(aka. access this piece of the memory), it will be (almost every time) occupied by another process/program, so trying to alter/modify some others memory will result inaccess violation
issues by thememory manager
.So in this example, any other value then 0 will result in undefined behavior, because no one knows what
*p
will point to at this moment.I hope this explanation is of any help.
Edit: Ah, sorry, again few answers ahead of me :)
Let me give an argument for why I think this is still undefined.
First, the responders saying this is "mostly defined" or somesuch, based on their experience with some compilers, are just wrong. A small modification of your example will serve to illustrate:
What does this program do if you provide "1" as input? If you answer is "It prints Hello and then crashes", you are wrong. "Undefined behavior" does not mean the behavior of some specific statement is undefined; it means the behavior of the entire program is undefined. The compiler is allowed to assume that you do not engage in undefined behavior, so in this case, it may assume that
v
is non-zero and simply not emit any of the bracketed code at all, including theprintf
.If you think this is unlikely, think again. GCC may not perform this analysis exactly, but it does perform very similar ones. My favorite example that actually illustrates the point for real:
Try writing a little test program to print out
INT_MAX
,INT_MAX+1
, andtest(INT_MAX)
. (Be sure to enable optimization.) A typical implementation might showINT_MAX
to be 2147483647,INT_MAX+1
to be -2147483648, andtest(INT_MAX)
to be 1.In fact, GCC compiles this function to return a constant 1. Why? Because integer overflow is undefined behavior, therefore the compiler may assume you are not doing that, therefore x cannot equal
INT_MAX
, thereforex+1
is greater thanx
, therefore this function can return 1 unconditionally.Undefined behavior can and does result in variables that are not equal to themselves, negative numbers that compare greater than positive numbers (see above example), and other bizarre behavior. The smarter the compiler, the more bizarre the behavior.
OK, I admit I cannot quote chapter and verse of the standard to answer the exact question you asked. But people who say "Yeah yeah, but in real life dereferencing NULL just gives a seg fault" are more wrong than they can possibly imagine, and they get more wrong with every compiler generation.
And in real life, if the code is dead you should remove it; if it is not dead, you must not invoke undefined behavior. So that is my answer to your question.
If v is 0, your random pointer assignment never gets executed, and the function will return zero, so it is not undefined behaviour
I'd say that the behavior is undefined only if the users inserts any number different from 0. After all, if the offending code section is not actually run the conditions for UB aren't met (i.e. the non-initialized pointer is not created neither dereferenced).
A hint of this can be found into the standard, at 3.4.3:
This seems to imply that, if such "erroneous data" was instead correct, the behavior would be perfectly defined - which seems pretty much applicable to our case.
Additional example: integer overflow. Any program that does an addition with user-provided data without doing extensive check on it is subject to this kind of undefined behavior - but an addition is UB only when the user provides such particular data.
Since this has the language-lawyer tag, I have an extremely nitpicking argument that the program's behavior is undefined regardless of user input, but not for the reasons you might expect -- though it can be well-defined (when
v==0
) depending on the implementation.The program defines
main
asC99 5.1.2.2.1 says that the main function shall be defined either as
or as
or equivalent; or in some other implementation-defined manner.
int main()
is not equivalent toint main(void)
. The former, as a declaration, says thatmain
takes a fixed but unspecified number and type of arguments; the latter says it takes no arguments. The difference is that a recursive call tomain
such asis a constraint violation if you use
int main(void)
, but not if you useint main()
.For example, these two programs:
are not equivalent.
If the implementation documents that it accepts
int main()
as an extension, then this doesn't apply for that implementation.This is an extremely nitpicking point (about which not everyone agrees), and is easily avoided by declaring
int main(void)
(which you should do anyway; all functions should have prototypes, not old-style declarations/definitions).In practice, every compiler I've seen accepts
int main()
without complaint.To answer the question that was intended:
Once that change is made, the program's behavior is well defined if
v==0
, and is undefined ifv!=0
. Yes, the definedness of the program's behavior depends on user input. There's nothing particularly unusual about that.It is simple. If a piece of code doesn't execute, it doesn't have a behavior!!!, whether defined or not.
If input is 0, then the code inside
if
doesn't run, so it depends on the rest of the program to determine whether the behavior is defined (in this case it is defined).If input is not 0, you execute code that we all know is a case of undefined behavior.