I'm pretty new to C++, but I know you can't just use memory willy nilly like the std::string class seems to let you do. For instance:
std::string f = "asdf";
f += "fdsa";
How does the string class handle getting larger and smaller? I assume it allocates a default amount of memory and if it needs more, it new
s a larger block of memory and copies itself over to that. But wouldn't that be pretty inefficient to have to copy the whole string every time it needed to resize? I can't really think of another way it could be done (but obviously somebody did).
And for that matter, how do all the stdlib classes like vector, queue, stack, etc handle growing and shrinking so transparently?
Usually, there's a doubling algorithm. In other words, when it fills the current buffer, it allocates a new buffer that's twice as big, and then copies the current data over. This results in fewer allocate/copy operations than the alternative of growing by a single allocation block.
Your analysis is correct — it is inefficient to copy the string every time it needs to resize. That's why common advice discourages that use pattern. Use the string's reserve
function to ask it to allocate enough memory for what you intend to store in it. Then further operations will fill that memory. (But if your hint turns out to be too small, the string will still grow automatically, too.)
Containers will also usually try to mitigate the effects of frequent re-allocation by allocating more memory than they need. A common algorithm is that when a string finds that it's out of space, it doubles its buffer size instead of just allocating the minimum required to hold the new value. If the string is being grown one character at a time, this doubling algorithm reduces the time complexity to amortized linear time (instead of quadratic time). It also reduces the program's susceptibility to memory fragmentation.
Although I do not know the exact implementation of std::string, most data structures that need to handle dynamic memory growth do so by doing exactly what you say - allocate a default amount of memory, and if more is needed then create a bigger block and copy yourself over.
The way you get around the obvious inefficiency problem is to allocate more memory than you need. The ratio of used memory:total memory of a vector/string/list/etc is often called the load factor (also used for hash tables in a slightly different meaning). Usually it's a 1:2 ratio - that is, you assign twice the memory you need. When you run out of space, you assign a new amount of memory twice your current amount and use that. This means that over time, if you continue to add things to the vector/string/etc, you need to copy over the item less and less (as the memory creation is exponential, and your inserting of new items is of course linear), and so the time taken for this method of memory handling is not as large as you might think. By the principles of Amortized Analysis, you can then see that inserting m
items into a vector/string/list using this method is only Big-Theta of m, not m2.