std::deque memory usage - Visual C++, and comparis

2019-02-16 06:37发布

问题:

Follow up to What the heque is going on with the memory overhead of std::deque?

Visual C++ manages deque blocks according to the container element type using this:

#define _DEQUESIZ   (sizeof (value_type) <= 1 ? 16 \
    : sizeof (value_type) <= 2 ? 8 \
    : sizeof (value_type) <= 4 ? 4 \
    : sizeof (value_type) <= 8 ? 2 \
    : 1)    /* elements per block (a power of 2) */

This results in very large memory footprint for small elements. By changing the 16 in the first line to 128 I was able to drastically reduce the footprint required for a large deque<char>. Process Explorer Private Bytes dropped from 181MB -> 113MB after 100m push_back(const char& mychar) calls).

  • Can anybody justify the values in that #define?
  • How do other compilers handle deque block sizing?
  • What would be their footprint (32-bit operation) for the simple test of 100m push_back calls to deque<char>?
  • Does STL allow for overriding of this block size at compile-time, without modifying the <deque> code?

回答1:

gcc has

return __size < 512 ? size_t(512 / __size) : size_t(1);

with a comment

/*  The '512' is
 *  tunable (and no other code needs to change), but no investigation has
 *  been done since inheriting the SGI code.
 */


回答2:

The Dinkumware (MS) implementation wants to grow the deque by 16-bytes at a time. Could it be that this is just an extremely old implementation (like the first one ever?) that was tuned for platforms with very little memory (by today's standards) to prevent overallocating and exhausting memory (like a std::vector will do)?

I had to implement my own queue in an application I'm working on because the 2.5X memory footprint of std::queue (which uses std::deque) was unacceptable.

There seems to be very little evidence on the interwebs that people have run into this inefficiency, which is surprising to me. I would think such a fundamental data structure as a queue (standard library, no less) would be quite ubiquitous in the wild, and would be in performance/time/space-critical applications. But here we are.

To answer the last question, the C++ standard does not define an interface to modify the block size. I'm pretty sure it doesn't mandate any implementation, just complexity requirements for insertions/removals at both ends.



回答3:

STLPort

... seems to use:

::: <stl/_alloc.h>
...
enum { _MAX_BYTES = 32 * sizeof(void*) };
...
::: <deque>
...
static size_t _S_buffer_size()
{
  const size_t blocksize = _MAX_BYTES;
  return (sizeof(_Tp) < blocksize ? (blocksize / sizeof(_Tp)) : 1);
}

So that would mean 32 x 4 = 128 bytes block size on 32bit and 32 x 8 = 256 bytes block size on 64 bit.

My thought: From a size overhead POV, I guess it would make sense for any implementation to operate with variable length blocks, but I think this would be extremely hard to get right with the constant time random access requirement of deque.

As for the question

Does STL allow for overriding of this block size at compile-time, without modifying the code?

Not possible here either.

Apache

(seems to be the Rogue Wave STL version) apparently uses:

static size_type _C_bufsize () {
    // deque only uses __rw_new_capacity to retrieve the minimum
    // allocation amount; this may be specialized to provide a
    // customized minimum amount
    typedef deque<_TypeT, _Allocator> _RWDeque;
    return _RWSTD_NEW_CAPACITY (_RWDeque, (const _RWDeque*)0, 0);
}

so there seems to be some mechanism to override the block size via specialization and the definition of ... looks like this:

// returns a suggested new capacity for a container needing more space
template <class _Container>
inline _RWSTD_CONTAINER_SIZE_TYPE
__rw_new_capacity (_RWSTD_CONTAINER_SIZE_TYPE __size, const _Container*)
{
    typedef _RWSTD_CONTAINER_SIZE_TYPE _RWSizeT;

    const _RWSizeT __ratio = _RWSizeT (  (_RWSTD_NEW_CAPACITY_RATIO << 10)
                                       / _RWSTD_RATIO_DIVIDER);

    const _RWSizeT __cap =   (__size >> 10) * __ratio
                           + (((__size & 0x3ff) * __ratio) >> 10);

    return (__size += _RWSTD_MINIMUM_NEW_CAPACITY) > __cap ? __size : __cap;
}

So I'd say it's, aehm, complicated.

(If anyone feels like figuring this out further, feel free to edit my answer directly or just leave a comment.)