I am curious as to what part of the dereferencing a NULL ptr causes undesired behavior. Example:
// #1
someObj * a;
a = NULL;
(*a).somefunc(); // crash, dereferenced a null ptr and called one of its function
// same as a->somefunc();
// #2
someObj * b;
anotherObj * c;
b = NULL;
c->anotherfunc(*b); // dereferenced the ptr, but didn't call one of it's functions
Here we see in #2 that I didn't actually try to access data or a function out of b, so would this still cause undesired behavior if *b just resolves to NULL and we're passing NULL into anotherfunc() ?
NULL is just 0. Since 0 doesn't point to a real memory address, you can't dereference it. *b can't just resolve to NULL, since NULL is something that applies to pointers, not objects.
you need to know more about anotherfunc() to tell what will happen when you pass it null. it might be fine, it might crash, depends on the code.
Whether or not the mere fact of dereferencing a null pointer already results in undefined behavior is currently a gray zone in the Standard, unfortunately. What is certain is that reading a value out of the result of dereferencing a pointer is undefined behavior.
That it is undefined behavior is stated by various notes throughout the Standard. But notes are not normative: They could say anything, but they will never be able to state any rules. Their purpose is entirely informative.
That calling a member function on a null pointer formally is undefined behavior too.
The formal problem with merely dereferencing a null pointer is that determining the identity of the resulting lvalue expression is not possible: Each such expression that results from dereferencing a pointer must unambiguously refer to an object or a function when that expression is evaluated. If you dereference a null pointer, you don't have an object or function that this lvalue identifies. This is the argument the Standard uses to forbid null-references.
Another problem that adds to the confusion is that the semantics of the
typeid
operator make part of this misery well defined. It says that if it was given an lvalue that resulted from dereferencing a null pointer, the result is throwing abad_typeid
exception. Although, this is a limited area where there exist an exception (no pun) to the above problem of finding an identity. Other cases exist where similar exception to undefined behavior is made (although much less subtle and with a reference on the affected sections).The committee discussed to solve this problem globally, by defining a kind of lvalue that does not have an object or function identity: The so called empty lvalue. That concept, however, still had problems, and they decided not to adopt it.
Now, practically, you will not encounter a crash when merely dereferencing a null pointer. The problem of identifying an object or function for an lvalue seems to be entirely language theoretical. What is problematic is when you try to read a value out of the result of dereference. The following case will almost certainly crash, because it tries to read an integer from an address which is most probably not mapped by the affected process
There are few cases where reading out of such an expression probably won't cause a crash. One is when you dereference an array pointer:
Since reading from an array just returns its address using a element pointer type, this will most probably just make a null pointer (but as you dereference a null pointer before, this still is undefined behavior formally). Another case is dereferencing of function null pointers. Here too, reading a function lvalue just give you its address but using a function pointer type:
Aswell as the other cases, this is undefined behavior too, of course, but will probably not result in a crash.
Like the above cases, just calling a non-virtual member function on a null pointer isn't practically problematic either, most probably - even though it formally is undefined behavior. Calling the function will jump to the functions address, and don't need to read any data. As soon as you would try to read a nonstatic data-member, the same problem occurs as when reading out of a normal null pointer. Some people place an
In front of some member function bodies in case they accidentally called a function on a null pointer. This may be a good idea when there are often cases where such functions are mistakenly called on null pointers, to catch errors early. But from a formal point of view,
this
can never be a null pointer in a member function.In the early days, programmers were spending lot of time tracing down memory corruption bugs. One day a light bulb light up in some smart programmer's head. He said "What if I make it illegal to access the first page of memory and point all invalid pointers to it?" Once that happened, most memory corruption bugs were quickly found.
That's the history behind null pointer. I heard the story so many years ago, I can't recall any detail now, but I'm sure someone how's older...I mean wiser can tell us more about it.
You are wandering in undefined territories.
You can think of calling a member function like calling a regular function with the additional, implicit
this
pointer argument. The function call itself is just putting the arguments in place according to call convention and jumping to a memory address.So just calling a member function on a NULL object pointer does not necassarily cause a crash (unless it is a virtual function). You get invalid memory access crashes only when you try to access the object's member variables or vtable.
In case #2 you may or may not get an immediate crash, depending on how
anotherfunc
is declared. If it takessomeObj
by value, then you're indirecting NULL in the function call itself, resulting in a crash. If it takessomeObj
by reference, usually nothing happens since references are implemented using pointers under the hood and the actual indirection is postponed until you try to access member data.I agree with Buck, in that in many cases it would be nice if calling a instance function on
null
resulted innull
. However, I don't think that it should be the default. Instead there should be another operator (I'll leave what that is up to someone else, but let's say it's->>
).One issue in C++, for instance, is that not all return types can be
null
anyway, such asint
. So a call toa->>length()
would be difficult to know what to return whena
itself wasnull
.Other languages where everything is a reference type, you would not have this problem.
Finally, Buck, what everyone else is saying is the way things are, especially for the C++ language: Dereferencing is a mechanical operation in most languages: It must return something of the same type and
null
is typically stored as zero. Older systems would just crash when you tried to resolve zero, newer ones would recognize the special nature of the value when the error occured.Also, these lower level languages cannot represent
null
as an integer (or other basic data types), so you could not in general deferencenull
asnull
in all cases.