Let's mess around with very basic dynamically allocated memory. We take a vector of 3, set its elements and return the sum of the vector.
In the first test case I used a raw pointer with new[]
/delete[]
. In the second I used std::vector
:
#include <vector>
int main()
{
//int *v = new int[3]; // (1)
auto v = std::vector<int>(3); // (2)
for (int i = 0; i < 3; ++i)
v[i] = i + 1;
int s = 0;
for (int i = 0; i < 3; ++i)
s += v[i];
//delete[] v; // (1)
return s;
}
Assembly of (1) (new[]
/delete[]
)
main: # @main
mov eax, 6
ret
Assembly of (2) (std::vector
)
main: # @main
push rax
mov edi, 12
call operator new(unsigned long)
mov qword ptr [rax], 0
movabs rcx, 8589934593
mov qword ptr [rax], rcx
mov dword ptr [rax + 8], 3
test rax, rax
je .LBB0_2
mov rdi, rax
call operator delete(void*)
.LBB0_2: # %std::vector<int, std::allocator<int> >::~vector() [clone .exit]
mov eax, 6
pop rdx
ret
Both outputs taken from https://gcc.godbolt.org/ with -std=c++14 -O3
In both versions the returned value is computed at compile time so we see just mov eax, 6; ret
.
With the raw new[]
/delete[]
the dynamic allocation was completely removed. With std::vector
however, the memory is allocated, set and freed.
This happens even with an unused variable auto v = std::vector<int>(3)
: call to new
, memory is set and then call to delete
.
I realize this is most likely a near impossible answer to give, but maybe someone has some insights and some interesting answers might pop out.
What are the contributing factors that don't allow compiler optimizations to remove the memory allocation in the std::vector
case, like in the raw memory allocation case?
When using a pointer to a dynamically allocated array (directly using new[] and delete[]), the compiler optimized away the calls to operator new
and operator delete
even though they have observable side effects. This optimization is allowed by the C++ standard section 5.3.4 paragraph 10:
An implementation is allowed to omit a call to a replaceable global
allocation function (18.6.1.1, 18.6.1.2). When it does so, the storage
is instead provided by the implementation or...
I'll show the rest of the sentence, which is crucial, at the end.
This optimization is relatively new because it was first allowed in C++14 (proposal N3664). Clang supported it since 3.4. The latest version of gcc, namely 5.3.0, doesn't take advantage of this relaxation of the as-if rule. It produces the following code:
main:
sub rsp, 8
mov edi, 12
call operator new[](unsigned long)
mov DWORD PTR [rax], 1
mov DWORD PTR [rax+4], 2
mov rdi, rax
mov DWORD PTR [rax+8], 3
call operator delete[](void*)
mov eax, 6
add rsp, 8
ret
MSVC 2013 also doesn't support this optimization. It produces the following code:
main:
sub rsp,28h
mov ecx,0Ch
call operator new[] ()
mov rcx,rax
mov dword ptr [rax],1
mov dword ptr [rax+4],2
mov dword ptr [rax+8],3
call operator delete[] ()
mov eax,6
add rsp,28h
ret
I currently don't have access to MSVC 2015 Update 1 and therefore I don't know whether it supports this optimization or not.
Finally, here is the assembly code generated by icc 13.0.1:
main:
push rbp
mov rbp, rsp
and rsp, -128
sub rsp, 128
mov edi, 3
call __intel_new_proc_init
stmxcsr DWORD PTR [rsp]
mov edi, 12
or DWORD PTR [rsp], 32832
ldmxcsr DWORD PTR [rsp]
call operator new[](unsigned long)
mov rdi, rax
mov DWORD PTR [rax], 1
mov DWORD PTR [4+rax], 2
mov DWORD PTR [8+rax], 3
call operator delete[](void*)
mov eax, 6
mov rsp, rbp
pop rbp
ret
Clearly, it doesn't support this optimization. I don't have access to the latest version of icc, namely 16.0.
All of these code snippets have been produced with optimizations enabled.
When using std::vector
, all of these compilers didn't optimize away the allocation. When a compiler doesn't perform an optimization, it's either because it cannot for some reason or it's just not yet supported.
What are the contributing factors that don't allow compiler
optimizations to remove the memory allocation in the std::vector case,
like in the raw memory allocation case?
The compiler didn't perform the optimization because it's not allowed to. To see this, let's see the rest of the sentence of paragraph 10 from 5.3.4:
An implementation is allowed to omit a call to a replaceable global
allocation function (18.6.1.1, 18.6.1.2). When it does so, the storage
is instead provided by the implementation or provided by extending the
allocation of another new-expression.
What this is saying is that you can omit a call to a replaceable global allocation function only if it originated from a new-expression. A new-expression is defined in paragraph 1 of the same section.
The following expression
new int[3]
is a new-expression and therefore the compiler is allowed to optimize away the associated allocation function call.
On the other hand, the following expression:
::operator new(12)
is NOT a new-expression (see 5.3.4 paragraph 1). This is just a function call expression. In other words, this is treated as a typical function call. This function cannot be optimized away because its imported from another shared library (even if you linked the runtime statically, the function itself calls another imported function).
The default allocator used by std::vector
allocates memory using ::operator new
and therefore the compiler is not allowed to optimize it away.
Let's test this. Here's the code:
int main()
{
int *v = (int*)::operator new(12);
for (int i = 0; i < 3; ++i)
v[i] = i + 1;
int s = 0;
for (int i = 0; i < 3; ++i)
s += v[i];
delete v;
return s;
}
By compiling using Clang 3.7, we get the following assembly code:
main: # @main
push rax
mov edi, 12
call operator new(unsigned long)
movabs rcx, 8589934593
mov qword ptr [rax], rcx
mov dword ptr [rax + 8], 3
test rax, rax
je .LBB0_2
mov rdi, rax
call operator delete(void*)
.LBB0_2:
mov eax, 6
pop rdx
ret
This is exactly the same as assembly code generated when using std::vector
except for mov qword ptr [rax], 0
which comes from the constructor of std::vector (the compiler should have removed it but failed to do so because of a flaw in its optimization algorithms).