Optimization of raw new[]/delete[] vs std::vector

Let's mess around with very basic dynamically allocated memory. We take a vector of 3, set its elements and return the sum of the vector.

In the first test case I used a raw pointer with new[]/delete[]. In the second I used std::vector:

#include <vector>   

int main()
{
  //int *v = new int[3];        // (1)
  auto v = std::vector<int>(3); // (2)


  for (int i = 0; i < 3; ++i)
    v[i] = i + 1;

  int s = 0;
  for (int i = 0; i < 3; ++i)
    s += v[i];

  //delete[] v;                 // (1)
  return s;
}

Assembly of (1) (new[]/delete[])

main:                                   # @main
        mov     eax, 6
        ret

Assembly of (2) (std::vector)

main:                                   # @main
        push    rax
        mov     edi, 12
        call    operator new(unsigned long)
        mov     qword ptr [rax], 0
        movabs  rcx, 8589934593
        mov     qword ptr [rax], rcx
        mov     dword ptr [rax + 8], 3
        test    rax, rax
        je      .LBB0_2
        mov     rdi, rax
        call    operator delete(void*)
.LBB0_2:                                # %std::vector<int, std::allocator<int> >::~vector() [clone .exit]
        mov     eax, 6
        pop     rdx
        ret

Both outputs taken from https://gcc.godbolt.org/ with -std=c++14 -O3

In both versions the returned value is computed at compile time so we see just mov eax, 6; ret.

With the raw new[]/delete[] the dynamic allocation was completely removed. With std::vector however, the memory is allocated, set and freed.

This happens even with an unused variable auto v = std::vector<int>(3): call to new, memory is set and then call to delete.

I realize this is most likely a near impossible answer to give, but maybe someone has some insights and some interesting answers might pop out.

What are the contributing factors that don't allow compiler optimizations to remove the memory allocation in the std::vector case, like in the raw memory allocation case?

When using a pointer to a dynamically allocated array (directly using new[] and delete[]), the compiler optimized away the calls to operator new and operator delete even though they have observable side effects. This optimization is allowed by the C++ standard section 5.3.4 paragraph 10:

An implementation is allowed to omit a call to a replaceable global allocation function (18.6.1.1, 18.6.1.2). When it does so, the storage is instead provided by the implementation or...

I'll show the rest of the sentence, which is crucial, at the end.

This optimization is relatively new because it was first allowed in C++14 (proposal N3664). Clang supported it since 3.4. The latest version of gcc, namely 5.3.0, doesn't take advantage of this relaxation of the as-if rule. It produces the following code:

main:
        sub     rsp, 8
        mov     edi, 12
        call    operator new[](unsigned long)
        mov     DWORD PTR [rax], 1
        mov     DWORD PTR [rax+4], 2
        mov     rdi, rax
        mov     DWORD PTR [rax+8], 3
        call    operator delete[](void*)
        mov     eax, 6
        add     rsp, 8
        ret

MSVC 2013 also doesn't support this optimization. It produces the following code:

main:
  sub         rsp,28h  
  mov         ecx,0Ch  
  call        operator new[] ()  
  mov         rcx,rax  
  mov         dword ptr [rax],1  
  mov         dword ptr [rax+4],2  
  mov         dword ptr [rax+8],3  
  call        operator delete[] ()  
  mov         eax,6  
  add         rsp,28h  
  ret

I currently don't have access to MSVC 2015 Update 1 and therefore I don't know whether it supports this optimization or not.

Finally, here is the assembly code generated by icc 13.0.1:

main:
        push      rbp                                          
        mov       rbp, rsp                                   
        and       rsp, -128                                    
        sub       rsp, 128                                     
        mov       edi, 3                                       
        call      __intel_new_proc_init                         
        stmxcsr   DWORD PTR [rsp]                               
        mov       edi, 12                                 
        or        DWORD PTR [rsp], 32832                       
        ldmxcsr   DWORD PTR [rsp]                               
        call      operator new[](unsigned long)
        mov       rdi, rax                                      
        mov       DWORD PTR [rax], 1                            
        mov       DWORD PTR [4+rax], 2                          
        mov       DWORD PTR [8+rax], 3                         
        call      operator delete[](void*)
        mov       eax, 6    
        mov       rsp, rbp                           
        pop       rbp                                   
        ret

Clearly, it doesn't support this optimization. I don't have access to the latest version of icc, namely 16.0.

All of these code snippets have been produced with optimizations enabled.

When using std::vector, all of these compilers didn't optimize away the allocation. When a compiler doesn't perform an optimization, it's either because it cannot for some reason or it's just not yet supported.

What are the contributing factors that don't allow compiler optimizations to remove the memory allocation in the std::vector case, like in the raw memory allocation case?

The compiler didn't perform the optimization because it's not allowed to. To see this, let's see the rest of the sentence of paragraph 10 from 5.3.4:

An implementation is allowed to omit a call to a replaceable global allocation function (18.6.1.1, 18.6.1.2). When it does so, the storage is instead provided by the implementation or provided by extending the allocation of another new-expression.

What this is saying is that you can omit a call to a replaceable global allocation function only if it originated from a new-expression. A new-expression is defined in paragraph 1 of the same section.

The following expression

new int[3]

is a new-expression and therefore the compiler is allowed to optimize away the associated allocation function call.

On the other hand, the following expression:

::operator new(12)

is NOT a new-expression (see 5.3.4 paragraph 1). This is just a function call expression. In other words, this is treated as a typical function call. This function cannot be optimized away because its imported from another shared library (even if you linked the runtime statically, the function itself calls another imported function).

The default allocator used by std::vector allocates memory using ::operator new and therefore the compiler is not allowed to optimize it away.

Let's test this. Here's the code:

int main()
{
  int *v =  (int*)::operator new(12);

  for (int i = 0; i < 3; ++i)
    v[i] = i + 1;

  int s = 0;
  for (int i = 0; i < 3; ++i)
    s += v[i];

  delete v;
  return s;
}

By compiling using Clang 3.7, we get the following assembly code:

main:                                   # @main
        push    rax
        mov     edi, 12
        call    operator new(unsigned long)
        movabs  rcx, 8589934593
        mov     qword ptr [rax], rcx
        mov     dword ptr [rax + 8], 3
        test    rax, rax
        je      .LBB0_2
        mov     rdi, rax
        call    operator delete(void*)
.LBB0_2:
        mov     eax, 6
        pop     rdx
        ret

This is exactly the same as assembly code generated when using std::vector except for mov qword ptr [rax], 0 which comes from the constructor of std::vector (the compiler should have removed it but failed to do so because of a flaw in its optimization algorithms).