Is shrink_to_fit the proper way of reducing the ca

2019-01-14 02:46发布

问题:

In C++11 shrink_to_fit was introduce to complement certain STL containers (e.g., std::vector, std::deque, std::string).

Synopsizing, its main functionality is to request the container that is associated to, to reduce its capacity to fit its size. However, this request is non-binding, and the container implementation is free to optimize otherwise and leave the vector with a capacity greater than its size.

Furthermore, in a previous SO question the OP was discouraged from using shrink_to_fit to reduce the capacity of his std::vector to its size. The reasons not to do so are quoted below:

shrink_to_fit does nothing or it gives you cache locality issues and it's O(n) to execute (since you have to copy each item to their new, smaller home). Usually it's cheaper to leave the slack in memory. @Massa

Could someone be so kind as to address the following questions:

  • Do the arguments in the quotation hold?
  • If yes, what's the proper way of shrinking an STL container's capacity to its size (at least for std::vector).
  • And if there's a better way to shrink a container, what's the reason for the existence of shrink_to_fit after-all?

回答1:

Do the arguments in the quotation hold?

Measure and you will know. Are you constrained in memory? Can you figure out the correct size up front? It will be more efficient to reserve than it is to shrink after the fact. In general I am inclined to agree on the premise that most uses are probably fine with the slack.

If yes, what's the proper way of shrinking an STL container's capacity to its size (at least for std::vector).

The comment does not only apply to shrink_to_fit, but to any other way of shrinking. Given that you cannot realloc in place, it involves acquiring a different chunk of memory and copying over there regardless of what mechanism you use for shrinking.

And if there's a better way to shrink a container, what's the reason for the existence of shrink_to_fit after-all?

The request is non-binding, but the alternatives don't have better guarantees. The question is whether shrinking makes sense, if it does, then it makes sense to provide a shring_to_fit operation that can take advantage of the fact that the objects are being moved to a new location. I.e. if the type T has a noexcept(true) move constructor, it will allocate the new memory and move the elements.

While you can achieve the same externally, this interface simplifies the operation. The equivalent to shrink_to_fit in C++03 would have been:

std::vector<T>(current).swap(current);

But the problem with this approach is that when the copy is done to the temporary it does not know that current is going to be replaced, there is nothing that tells the library that it can move the held objects. Note that using std::move(current) would not achieve the desired effect as it would move the whole buffer, maintaining the same capacity().

Implementing this externally would be a bit more cumbersome:

{
   std::vector<T> copy;
   if (noexcept(T(std::move(declval<T>())))) {
      copy.assign(std::make_move_iterator(current.begin()),
                  std::make_move_iterator(current.end()));
   } else {
      copy.assign(current.begin(), current.end());
   }
   copy.swap(current);
}

Assuming that I got the if condition right... which is probably not what you want to write every time that you want this operation.



回答2:

  • Will the arguments hold?

As the arguments are originally mine, don't mind if I defend them, one by one:

  1. Either shrink_to_fit does nothing (...)

    As it was mentioned, the standard says (many times, but in the case of vector it's section 23.3.7.3...) that the request is non-binding to allow an implementation latitude for optimizations. This means that the implementation can define shrink_to_fit as an no-op.

  2. (...) or it gives you cache locality issues

    In the case that shrink_to_fit is not implemented as a no-op, you have to allocate a new underlying container with capacity size(), copy (or, in the best case, move) construct all your N = size() new items from the old ones, destruct all the old ones (in the move case this should be optimized, but it's possible that this involves a loop again over the old container) and then destructing the old container per se. This is done, in libstdc++-4.9, exactly as David Rodriguez has described, by

          _Tp(__make_move_if_noexcept_iterator(__c.begin()),
              __make_move_if_noexcept_iterator(__c.end()),
              __c.get_allocator()).swap(__c);
    

    and in libc++-3.5, by a function in __alloc_traits that does approximately the same.

    Oh, and an implementation absolutely cannot rely on realloc (even if it uses malloc inside ::operator new for its memory allocations) because realloc, if it cannot shrink in-place, will either leave the memory alone (no-op case) or make a bitwise copy (and miss the opportunity for readjusting pointers, etc. that the proper C++ copying/moving constructors would give).

    Sure, one can write a shrinkable memory allocator, and use it in the constructor of its vectors.

    In the easy case where the vectors are larger than the cache lines, all that movement puts pressure on the cache.

  3. and it's O(n)

    If n = size(), I think it was established above that, at the very least, you have to do one n sized allocation, n copy or move constructions, n destructions, and one old_capacity sized deallocation.

  4. usually it's cheaper just to leave the slack in memory

    Obviously, unless you are really pressed for free memory (in which case it might be wiser to save your data to the disk and re-load it later on demand...)

  • If yes, what's the proper way of shrinking an STL container's capacity to its size (at least for std::vector).

The proper way is still shrink_to_fit... you just have to either not rely on it or know very well your implementation!

  • And if there's a better way to shrink a container, what's the reason for the existence of shrink_to_fit after-all?

There is no better way, but the reason for the existence of shrink_to_fit is, AFAICT, that sometimes your program might feel memory pressure and it's one way of treating it. Not a very good way, but still.

HTH!



回答3:

  • If yes, what's the proper way of shrinking an STL container's capacity to its size (at least for std::vector).

The 'swap trick' will trim a vector to the exact size required (from More Effective SQL):

vector<Person>(persons).swap(persons);

Particularly useful when the vector is empty, to release all memory:

vector<Person>().swap(persons);

Vectors were constantly tripping my unit tester's memory leak detection code because of retained allocations of unused space, and this sorted them out perfectly.

This is the kind of example where I really don't care about runtime efficiency (size or speed), but I do care about exact memory usage.

  • And if there's a better way to shrink a container, what's the reason for the existence of shrink_to_fit after-all?

I really don't know what the point of providing a function that can legally do absolutely nothing is. I cheered when I saw it had been introduced, then despaired when I found it couldn't be relied upon.

Perhaps we'll see maybe_sort() in the next version.