Consider a simple program:
int main() {
int* ptr = nullptr;
delete ptr;
}
With GCC (7.2), there is a call
instruction regarding to operator delete
in the resulting program. With Clang and Intel compilers, there are no such instructions, the null pointer deletion is completely optimized out (-O2
in all cases). You can test here: https://godbolt.org/g/JmdoJi.
I wonder whether such an optimization can be somehow turned on with GCC? (My broader motivation stems from a problem of custom swap
vs std::swap
for movable types, where deletion of null pointers can represent a performance penalty in the second case; see https://stackoverflow.com/a/45689282/580083 for details.)
UPDATE
To clarify my motivation for the question: If I use just delete ptr;
without if (ptr)
guard in a move assignment operator and a destructor of some class, then std::swap
with objects of that class yields 3 call
instructions with GCC. This might be a considerable performance penalty, e.g., when sorting an array of such objects.
Moreover, I can write if (ptr) delete ptr;
everywhere, but wonder, whether this cannot be a performance penalty as well, since delete
expression needs to check ptr
as well. But, here, I guess, compilers will generate a single check only.
Also, I really like the possibility to call delete
without the guard and it was a surprise for me, that it could yield different (performance) outcomes.
UPDATE
I just did a simple benchmark, namely sorting objects, which invoke delete
in their move assignment operator and destructor. The source is here: https://godbolt.org/g/7zGUvo
Running times of std::sort
measured with GCC 7.1 and -O2
flag on Xeon E2680v3:
There is a bug in the linked code, it compares pointers, not pointed values. Corrected results are as follows:
- without
if
guard:17.6 [s]40.8 [s], - with
if
guard:10.6 [s]31.5 [s], - with
if
guard and customswap
:10.4 [s]31.3 [s].
These results were absolutely consistent across many runs with minimal deviation. The performance difference between first two cases is significant and I wouldn't say that this is some "exceedingly rare corner case" like code.
It's always safe (for correctness) to let your program call
operator delete
with a nullptr.For performance, it's very rare that having the compiler-generated asm actually do an extra test and conditional branch to skip a call to
operator delete
will be a win. (You can help gcc optimize away compile-timenullptr
deletion without adding a runtime check, though; see below).First of all, larger code-size outside of a real hot-spot increases pressure on the L1I cache, and the even smaller decoded-uop cache on x86 CPUs that have one (Intel SnB-family, AMD Ryzen).
Second, extra conditional branches use up entries in the branch-prediction caches (BTB = Branch Target Buffer and so on). Depending on the CPU, even a branch that's never taken may worsen predictions for other branches if it aliases them in the BTB. (On others, such a branch never gets an entry in the BTB, to save entries for branches where the default static prediction of fall-through is accurate.) See https://xania.org/201602/bpu-part-one.
If
nullptr
is rare in a given code path, then on average checking & branch to avoid thecall
ends up with your program spending more time on the check than the check saves.If profiling shows you have a hot-spot that includes a
delete
, and instrumentation / logging shows that it often actually callsdelete
with a nullptr, then it's worth tryingif (ptr) delete ptr;
instead of justdelete ptr;
Branch prediction might have better luck in that one call site than for the branch inside
operator delete
, especially if there's any correlation with other nearby branches. (Apparently modern BPUs don't just look at each branch in isolation.) This is on top of saving the unconditionalcall
into the library function (plus anotherjmp
from the PLT stub, from dynamic linking overhead on Unix/Linux).If you are checking for null for any other reason, then it could make sense to put the
delete
inside the non-null branch of your code.You can avoid
delete
calls in cases where gcc can prove (after inlining) that a pointer is null, but without doing a runtime check if not:It will always return false with clang because it evaluates
__builtin_constant_p
before inlining. But since clang already skipsdelete
calls when it can prove a pointer is null, you don't need it.This might actually help in
std::move
cases, and you can safely use it anywhere with (in theory) no performance downside. I always compiles toif(true)
orif(false)
, so it's very different fromif(ptr)
, which is likely to result in a runtime branch because the compiler probably can't prove the pointer is non-null in most cases either. (A dereference might, though, because a null deref would be UB, and modern compilers optimized based on the assumption that the code doesn't contain any UB).You could make this a macro to avoid bloating non-optimized builds (and so it would "work" without having to inline first). You can use a GNU C statement-expression to avoid double-evaluating the macro arg (see examples for GNU C
min()
andmax()
). For the fallback for compilers without GNU extensions, you could write((ptr), false)
or something to evaluate the arg once for side effects while producing afalse
result.Demonstration: asm from gcc6.3 -O3 on the Godbolt compiler explorer
It compiles correctly with MSVC (also on the compiler explorer link), but with the test always returning false,
bar()
is:Interesting to note that MSVC's
operator delete
takes the object size as a function arg (mov edx, 4
), but gcc/Linux/libstdc++ code just passes the pointer.Related: I found this blog post, using C11 (not C++11)
_Generic
to try to portably do something like__builtin_constant_p
null-pointer checks inside static initializers.It's a QOI issue. clang does indeed elide the test:
https://godbolt.org/g/nBSykD
I think, the compiler has no knowledge about "delete", especially that "delete null" is a NOOP.
You may write it explicit, so the compiler does not need to imply knowledge about delete.
WARNING: I do not recommend this as general implementation. The following example should show, how you could "convince" a limited compiler to remove code anyway in that very special and limited program
Where I remember right, there is a way to replace "delete" with an own function. And in the case an optimization by the compiler would went wrong.
@RichardHodges: Why should it be an de-optimization when one give the compiler the hint to remove a call?
delete null is in general a NOOP (no operation). However, since it is possible to replace or overwrite delete there is no garanty for all cases.
So it is up to the compiler to know and to decide whether to use the knowledge that delete null could always removed. there are good arguments for both choises
However, the compiler is always allowed to remove dead code, this "if (false) {...}" or "if (nullptr != nullptr) {...}"
So a compiler will remove dead code and then when using explicit checking, it looks like
Please tell me, where is there a de-optimization?
I call my proposal a defensive style of coding, but not a de-optimization
If someone may argue, that now the non-nullptr will causes two-times checking on nullptr, I have to reply
@Peter Cordes: I agree guarding with an if is not an general optimization rule. However, general optimization was NOT the question of the opener. The question was why some compiler do not elimate the delete in a very short, non-sense program. I showed a way to make the compiler to eliminate it anyway.
If a situation happen like in that short program, probably something other is wrong. In general I would try to avoid new/delete (malloc/free) as the calls are rather expensive. If possible I prefer to use the stack (auto).
When I take a look at the meanwhile documented real case, I would say, class X is designed wrong, causing poor performance and too much memory. (https://godbolt.org/g/7zGUvo)
Instead of
in would design
or more earlier, I would ask of the sense of sorting empty/invalid items. In the end I would like to get rid of "valid", too.
According to C++14 [expr.delete]/7:
So both compilers do comply with the standard, because it's unspecified whether
operator delete
is called for deletion of a null pointer.Note that the godbolt online compiler just compiles the source file without linking. So the compiler at that stage must allow for the possibility that
operator delete
will be replaced by another source file.As already speculated in another answer -- gcc may be angling for consistent behaviour in the case of a replacement
operator delete
; this implementation would mean that someone can overload that function for debug purposes and break on all invocations of thedelete
expression, even when it happened to be deleting a null pointer.UPDATED: Removed speculation that this might not be a practical issue, since OP provided benchmarks showing that it in fact is.
Standard actually states when allocation and deallocation functions shall be called and where they not. This clause (@ n4296)
probably would be main reason why those function calls aren't omitted arbitrary. If they were, the replacement of their implementation of library would cause incoherent function of compiled program.
...
Standard states what should be done if pointer is NOT null. Implying that delete in that case is noop, but to what end, is not specified.
First of all, I'll just agree with some previous answerers in that it's not a bug, and GCC may do as it pleases here. That said, I was wondering whether this means that some common and simple RAII code may be slower on GCC than Clang because a straightforward optimization is not done.
So I wrote a small test case for RAII:
As you may see here, GCC does elide the second call to
delete
insetA1
(for the moved-from temporary that was created in the call togetA2
). The first call is necessary for program correctness becausea1
ora1.ptr
may have been previously assigned to.Obviously I would prefer more "rhyme and reason" – why is the optimization done sometimes but not always – but I'm not willing to sprinkle redundant
if ( ptr != nullptr )
checks all over my RAII code just yet.