Consider the following test program:
#include <iostream>
#include <string>
#include <vector>
int main()
{
std::cout << sizeof(std::string("hi")) << " ";
std::string a[10];
std::cout << sizeof(a) << " ";
std::vector<std::string> v(10);
std::cout << sizeof(v) + sizeof(std::string) * v.capacity() << "\n";
}
Output for libstdc++
and libc++
respectively are:
8 80 104
24 240 264
As you can see, libc++
takes 3 times as much memory for a simple program. How does the implementation differ that causes this memory disparity? Do I need to be concerned and how do I workaround it?
Summary: It only looks like
libstdc++
uses onechar*
. In fact, it allocates more memory.So, you should not be concerned that Clang's
libc++
implementation is memory inefficient.From the documentation of libstdc++ (under Detailed Description):
So, it just looks like one
char*
but that is misleading in terms of memory usage.Previously
libstdc++
basically used this layout:That is closer to the results from
libc++
.libc++
uses "short string optimization". The exact layout depends on whether_LIBCPP_ABI_ALTERNATE_STRING_LAYOUT
is defined. If it is defined, the data pointer will be word-aligned if the string is short. For details, see the source code.Short string optimization avoids heap allocations, so it also looks more costly than
libstdc++
implementation if you only consider the parts that are allocated on the stack.sizeof(std::string)
only shows the stack usage not the overall memory usage (stack + heap).Here is a short program to help you explore both kinds of memory usage of
std::string
: stack and heap.Using http://melpon.org/wandbox/ it is easy to get output for different compiler/lib combinations, for example:
gcc 4.9.1:
gcc 5.0.0:
clang/libc++:
VS-2015:
(the last line is from http://webcompiler.cloudapp.net)
The above output also shows
capacity
, which is a measure of how manychar
s the string can hold before it has to allocate a new, larger buffer from the heap. For the gcc-5.0, libc++, and VS-2015 implementations, this is a measure of the short string buffer. That is, the size buffer allocated on the stack to hold short strings, thus avoiding the more expensive heap allocation.It appears that the libc++ implementation has the smallest (stack usage) of the short-string implementations, and yet contains the largest of the short string buffers. And if you count total memory usage (stack + heap), libc++ has the smallest total memory usage for this 2-character string among all 4 of these implementations.
It should be noted that all of these measurements were taken on 64 bit platforms. On 32 bit, the libc++ stack usage will go down to 12, and the small string buffer goes down to 10. I don't know the behavior of the other implementations on 32 bit platforms, but you can use the above code to find out.
I haven't checked the actual implementations in source code, but I remember checking this when I was working on my C++ string library. A 24 byte string implementation is typical. If the length of the string is smaller than or equal to 16 bytes, instead of malloc'ing from the heap, it copies the string into the internal buffer of size 16 bytes. Otherwise, it mallocs and stores the memory address etc. This minor buffering actually helps in terms of running time performance.
For some compilers, there's an option to turn the internal buffer off.
You should not be concerned, standard library implementors know what they are doing.
Using the latest code from the GCC subversion trunk libstdc++ gives these numbers:
This is because as of a few weeks ago I switched the default
std::string
implementation to use the small-string optimisation (with space for 15 chars) instead of the copy-on-write implementation that you tested with.