Most std::string
implementations (GCC included) use small string optimization. E.g. there's an answer discussing this.
Today, I decided to check at what point a string in a code I compile gets moved to the heap. To my surprise, my test code seems to show that no small string optimization occurs at all!
Code:
#include <iostream>
#include <string>
using std::cout;
using std::endl;
int main(int argc, char* argv[]) {
std::string s;
cout << "capacity: " << s.capacity() << endl;
cout << (void*)s.c_str() << " | " << s << endl;
for (int i=0; i<33; ++i) {
s += 'a';
cout << (void*)s.c_str() << " | " << s << endl;
}
}
The output of g++ test.cc && ./a.out
is
capacity: 0
0x7fe405f6afb8 |
0x7b0c38 | a
0x7b0c68 | aa
0x7b0c38 | aaa
0x7b0c38 | aaaa
0x7b0c68 | aaaaa
0x7b0c68 | aaaaaa
0x7b0c68 | aaaaaaa
0x7b0c68 | aaaaaaaa
0x7b0c98 | aaaaaaaaa
0x7b0c98 | aaaaaaaaaa
0x7b0c98 | aaaaaaaaaaa
0x7b0c98 | aaaaaaaaaaaa
0x7b0c98 | aaaaaaaaaaaaa
0x7b0c98 | aaaaaaaaaaaaaa
0x7b0c98 | aaaaaaaaaaaaaaa
0x7b0c98 | aaaaaaaaaaaaaaaa
0x7b0cd8 | aaaaaaaaaaaaaaaaa
0x7b0cd8 | aaaaaaaaaaaaaaaaaa
0x7b0cd8 | aaaaaaaaaaaaaaaaaaa
0x7b0cd8 | aaaaaaaaaaaaaaaaaaaa
0x7b0cd8 | aaaaaaaaaaaaaaaaaaaaa
0x7b0cd8 | aaaaaaaaaaaaaaaaaaaaaa
0x7b0cd8 | aaaaaaaaaaaaaaaaaaaaaaa
0x7b0cd8 | aaaaaaaaaaaaaaaaaaaaaaaa
0x7b0cd8 | aaaaaaaaaaaaaaaaaaaaaaaaa
0x7b0cd8 | aaaaaaaaaaaaaaaaaaaaaaaaaa
0x7b0cd8 | aaaaaaaaaaaaaaaaaaaaaaaaaaa
0x7b0cd8 | aaaaaaaaaaaaaaaaaaaaaaaaaaaa
0x7b0cd8 | aaaaaaaaaaaaaaaaaaaaaaaaaaaaa
0x7b0cd8 | aaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
0x7b0cd8 | aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
0x7b0cd8 | aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
0x7b0d28 | aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
I'm guessing that the larger first pointer, i.e. 0x7fe405f6afb8
is a stack pointer, and the other ones point to the heap. Running this many times produces identical results, in the sense that the first address is always large, and the other ones are smaller; the exact values usually differ. The smaller addresses always follow the standard power of 2 allocation scheme, e.g. 0x7b0c38
is listed once, then 0x7b0c68
is listed once, then 0x7b0c38
twice, then 0x7b0c68
4 times, then 0x7b0c98
8 times, etc.
After reading Howard's answer, using a 64bit machine, I was expecting to see the same address printed for the first 22 characters, and only then to see it change.
Am I missing something?
Also, interestingly, if I compile with -O
(at any level), I get a constant small pointer value 0x6021f8
in the first case, instead of the large value, and this 0x6021f8
doesn't change regardless of how many times I run the program.
Output of g++ -v
:
Using built-in specs.
COLLECT_GCC=g++
COLLECT_LTO_WRAPPER=/foo/bar/gcc-6.2.0/gcc/libexec/gcc/x86_64-redhat-linux/6.2.0/lto-wrapper
Target: x86_64-redhat-linux
Configured with: ../gcc-6.2.0/configure --prefix=/foo/bar/gcc-6.2.0/gcc --build=x86_64-redhat-linux --disable-multilib --enable-languages=c,c++,fortran --with-default-libstdcxx-abi=gcc4-compatible --enable-bootstrap --enable-threads=posix --with-long-double-128 --enable-long-long --enable-lto --enable-__cxa_atexit --enable-gnu-unique-object --with-system-zlib --enable-gold
Thread model: posix
gcc version 6.2.0 (GCC)