Nested STL vector using way too much memory

2019-02-15 02:05发布

站内文章 / C++

18 0

一夜七次

女 | 书童

私信

可以将文章内容翻译成中文,广告屏蔽插件可能会导致该功能失效(如失效，请关闭广告屏蔽插件后再试):

问题:

I have an STL vector My_Partition_Vector of Partition objects, defined as

struct Partition // the event log data structure
{
    int key;
    std::vector<std::vector<char> > partitions;
    float modularity;
};

The actual nested structure of Partition.partitions varies from object to object but in the total number of chars stored in Partition.partitions is always 16.

I assumed therefore that the total size of the object should be more or less 24 bytes (16 + 4 + 4). However for every 100,000 items I add to My_Partition_Vector, memory consumption (found using ps -aux) increases by around 20 MB indicating around 209 bytes for each Partition Object.

This is a nearly 9 Fold increase!? Where is all this extra memory usage coming from? Some kind of padding in the STL vector, or the struct? How can I resolve this (and stop it reaching into swap)?

回答1:

For one thing std::vector models a dynamic array so if you know that you'll always have 16 chars in partitions using std::vector is overkill. Use a good old C style array/matrix, boost::array or boost::multi_array.

To reduce the number of re-allocations needed for inserting/adding elements due to it's memory layout constrains std::vector is allowed to preallocate memory for a certain number of elements upfront (and it's capacity() member function will tell you how much).

回答2:

While I think he may be overstating the situation just a tad, I'm in general agreement with DeadMG's conclusion that what you're doing is asking for trouble.

Although I'm generally the one looking at (whatever mess somebody has made) and saying "don't do that, just use a vector", this case might well be an exception. You're creating a huge number of objects that should be tiny. Unfortunately, a vector typically looks something like this:

template <class T>
class vector { 
    T *data;
    size_t allocated;
    size_t valid;
public:
    // ...
};

On a typical 32-bit machine, that's twelve bytes already. Since you're using a vector<vector<char> >, you're going to have 12 bytes for the outer vector, plus twelve more for each vector it holds. Then, when you actually store any data in your vectors, each of those needs to allocate a block of memory from the free store. Depending on how your free store is implemented, you'll typically have a minimum block size -- frequently 32 or even 64 bytes. Worse, the heap typically has some overhead of its own, so it'll add some more memory onto each block, for its own book-keeping (e.g., it might use a linked list of blocks, adding another pointer worth of data to each allocation).

Just for grins, let's assume you average four vectors of four bytes apiece, and that your heap manager has a 32-byte minimum block size and one extra pointer (or int) for its bookkeeping (giving a real minimum of 36 bytes per block). Multiplying that out, I get 204 bytes apiece -- close enough to your 209 to believe that's reasonably close to what you're dealing with.

The question at that point is how to deal with the problem. One possibility is to try to work behind the scenes. All the containers in the standard library use allocators to get their memory. While they default allocator gets memory directly from the free store, you can substitute a different one if you choose. If you do some looking around, you can find any number of alternative allocators, many/most of which are to help with exactly the situation you're in -- reducing wasted memory when allocating lots of small objects. A couple to look at would be the Boost Pool Allocator and the Loki small object allocator.

Another possibility (that can be combined with the first) would be to quit using a vector<vector<char> > at all, and replace it with something like:

char partitions[16];
struct parts { 
    int part0 : 4;
    int part1 : 4;
    int part2 : 4;
    int part3 : 4;
    int part4 : 4;
    int part5 : 4;
    int part6 : 4
    int part7 : 4;
};

For the moment, I'm assuming a maximum of 8 partitions -- if it could be 16, you can add more to parts. This should probably reduce memory usage quite a bit more, but (as-is) will affect your other code. You could also wrap this up into a small class of its own that provides 2D-style addressing to minimize impact on the rest of your code.

回答3:

If you store a near constant amount of objects, then I suggest to use a 2-dimensional array.

The most likely reason for the memory consumption is debug data. STL implementations usually store A LOT of debug data. Never profile an application with debug flags on.

回答4:

...This is a bit of a side conversation, but boost::multi_array was suggested as an alternative to the OP's use of nested vectors. My finding was that multi_array was using a similar amount of memory when applied to the OP's operating parameters.

I derived this code from the example at Boost.MultiArray. On my machine, this showed multi_array using about 10x more memory than ideally required assuming that the 16 bytes are arranged in a simple rectangular geometry.

To evaluate the memory usage, I checked the system monitor while the program was running and I compiled with

( export CXXFLAGS="-Wall -DNDEBUG -O3" ; make main && ./main )

Here's the code...

   #include <iostream>
   #include <vector>
   #include "boost/multi_array.hpp"
   #include <tr1/array>
   #include <cassert>

   #define USE_CUSTOM_ARRAY 0 // compare memory usage of my custom array vs. boost::multi_array

   using std::cerr;
   using std::vector;

  #ifdef USE_CUSTOM_ARRAY
   template< typename T, int YSIZE, int XSIZE >
   class array_2D
      {
      std::tr1::array<char,YSIZE*XSIZE> data;
   public:
      T & operator () ( int y, int x ) { return data[y*XSIZE+x]; } // preferred accessor (avoid pointers)
      T * operator [] ( int index ) { return &data[index*XSIZE]; } // alternative accessor (mimics boost::multi_array syntax)
      };
  #endif

int main ()
   {

   int COUNT = 1024*1024;

  #if USE_CUSTOM_ARRAY
   vector< array_2D<char,4,4> > A( COUNT );
   typedef int index;
  #else
   typedef boost::multi_array<char,2> array_type;
   typedef array_type::index index;
   vector<array_type> A( COUNT, array_type(boost::extents[4][4]) );
  #endif

  // Assign values to the elements
  int values = 0;
  for ( int n=0; n<COUNT; n++ )
     for(index i = 0; i != 4; ++i) 
       for(index j = 0; j != 4; ++j)
           A[n][i][j] = values++;

// Verify values
   int verify = 0;
    for ( int n=0; n<COUNT; n++ )
       for(index i = 0; i != 4; ++i) 
          for(index j = 0; j != 4; ++j)
             {
             assert( A[n][i][j] == (char)((verify++)&0xFF) );
            #if USE_CUSTOM_ARRAY
             assert( A[n][i][j] == A[n](i,j) ); // testing accessors
            #endif
             }

   cerr <<"spinning...\n";
   while ( 1 ) {} // wait here (so you can check memory usage in the system monitor)

   return 0;
   }

回答5:

On my system, sizeof(vector) is 24. This probably corresponds to 3 8-byte members: capacity, size, and pointer. Additionally, you need to consider the actual allocations which would be between 1 and 16 bytes (plus allocation overhead) for the inner vector and between 24 and 384 bytes for the outer vector ( sizeof(vector) * partitions.capacity() ).

I wrote a program to sum this up...

   for ( int Y=1; Y<=16; Y++ )
      {

      const int X = 16/Y;
      if ( X*Y != 16 ) continue; // ignore imperfect geometries

      Partition a;
      a.partitions = vector< vector<char> >( Y, vector<char>(X) );

      int sum = sizeof(a); // main structure
      sum += sizeof(vector<char>) * a.partitions.capacity(); // outer vector
      for ( int i=0; i<(int)a.partitions.size(); i++ )
         sum += sizeof(char) * a.partitions[i].capacity(); // inner vector

      cerr <<"X="<<X<<", Y="<<Y<<", size = "<<sum<<"\n";

      }

The results show how much memory (not including allocation overhead) is need for each simple geometry...

X=16, Y=1, size = 80
X=8, Y=2, size = 104
X=4, Y=4, size = 152
X=2, Y=8, size = 248
X=1, Y=16, size = 440

Look at the how the "sum" is calculated to see what all of the components are.

The results posted are based on my 64-bit architecture. If you have a 32-bit architecture the sizes would be almost half as much -- but still a lot more than what you had expected.

In conclusion, std::vector<> is not very space efficient for doing a whole bunch of very small allocations. If your application is required to be efficient, then you should use a different container.

My approach to solving this would probably be to allocate the 16 chars with

std::tr1::array<char,16>

and wrap that with a custom class that maps 2D coordinates onto the array allocation.

Below is a very crude way of doing this, just as an example to get you started. You would have to change this to meet your specific needs -- especially the ability to specify the geometry dynamically.

   template< typename T, int YSIZE, int XSIZE >
   class array_2D
      {
      std::tr1::array<char,YSIZE*XSIZE> data;
   public:
      T & operator () ( int y, int x ) { return data[y*XSIZE+x]; } // preferred accessor (avoid pointers)
      T * operator [] ( int index ) { return &data[index*XSIZE]; } // alternative accessor (mimics boost::multi_array syntax)
      };

回答6:

16 bytes is a complete and total waste. You're storing a hell of a lot of data about very small objects. A vector of vector is the wrong solution to use. You should log sizeof(vector) - it's not insignificant, as it performs a substantial function. On my compiler, sizeof(vector) is 20. So each Partition is 4 + 4 + 16 + 20 + 20*number of inner partitions + memory overheads like the vectors not being the perfect size.

You're only storing 16 bytes of data, and wasting ridiculous amounts of memory allocating them in the most segregated, highest overhead way you could possibly think of. The vector doesn't use a lot of memory - you have a terrible design.