C++: Optimizing speed with vector/array?

2019-02-08 15:31发布

问题:

I have a nested for-loop structure and right now I am re-declaring the vector at the start of each iteration:

void function (n1,n2,bound,etc){

    for (int i=0; i<bound; i++){
             vector< vector<long long> > vec(n1, vector<long long>(n2));
             //about three more for-loops here
    }
}

This allows me to "start fresh" each iteration, which works great because my internal operations are largely in the form of vec[a][b] += some value. But I worry that it's slow for large n1 or large n2. I don't know the underlying architecture of vectors/arrays/etc so I am not sure what the fastest way is to handle this situation. Should I use an array instead? Should I clear it differently? Should I handle the logic differently altogether?

EDIT: The vector's size technically does not change each iteration (but it may change based on function parameters). I'm simply trying to clear it/etc so the program is as fast as humanly possible given all other circumstances.

EDIT:

My results of different methods:

Timings (for a sample set of data):
reclaring vector method: 111623 ms
clearing/resizing method: 126451 ms
looping/setting to 0 method: 88686 ms

回答1:

Here is some code that tests a few different methods.

#include <chrono>
#include <iostream>
#include <vector>

int main()
{
  typedef std::chrono::high_resolution_clock clock;

  unsigned n1 = 1000;
  unsigned n2 = 1000;

  // Original method
  {
    auto start = clock::now();
    for (unsigned i = 0; i < 10000; ++i)
    {
      std::vector<std::vector<long long>> vec(n1, std::vector<long long>(n2));
      // vec is initialized to zero already

      // do stuff
    }
    auto elapsed_time = clock::now() - start;

    std::cout << elapsed_time.count() << std::endl;
  }


  // reinitialize values to zero at every pass in the loop
  {
    auto start = clock::now();
    std::vector<std::vector<long long>> vec(n1, std::vector<long long>(n2));
    for (unsigned i = 0; i < 10000; ++i)
    {
      // initialize vec to zero at the start of every loop
      for (unsigned j = 0; j < n1; ++j)
        for (unsigned k = 0; k < n2; ++k)
            vec[j][k] = 0;

      // do stuff
    }
    auto elapsed_time = clock::now() - start;

    std::cout << elapsed_time.count() << std::endl;
  }

  // clearing the vector this way is not optimal since it will destruct the
  // inner vectors
  {
    auto start = clock::now();
    std::vector<std::vector<long long>> vec(n1, std::vector<long long>(n2));
    for (unsigned i = 0; i < 10000; ++i)
    {
      vec.clear();
      vec.resize(n1, std::vector<long long>(n2));

      // do stuff
    }
    auto elapsed_time = clock::now() - start;

    std::cout << elapsed_time.count() << std::endl;
  }

  // equivalent to the second method from above
  // no performace penalty
  {
    auto start = clock::now();
    std::vector<std::vector<long long>> vec(n1, std::vector<long long>(n2));
    for (unsigned i = 0; i < 10000; ++i)
    {
      for (unsigned j = 0; j < n1; ++j)
      {
        vec[j].clear();
        vec[j].resize(n2);
      }

      // do stuff
    }
    auto elapsed_time = clock::now() - start;

    std::cout << elapsed_time.count() << std::endl;
  }
}

Edit: I've updated the code to make a fairer comparison between the methods. Edit 2: Cleaned up the code a bit, methods 2 or 4 are the way to go.

Here are the timings of the above four methods on my computer:

16327389
15216024
16371469
15279471

The point is that you should try out different methods and profile your code.



回答2:

I have a clear preference for small scopes (i.e. declaring the variable in the innermost loop if it’s only used there) but for large sizes this could cause a lot of allocations.

So if this loop is a performance problem, try declaring the variable outside the loop and merely clearing it inside the loop – however, this is only advantageous if the (reserved) size of the vector stays identical. If you are resizing the vector, then you get reallocations anyway.

Don’t use a raw array – it doesn’t give you any advantage, and only trouble.



回答3:

When choosing a container i usually use this diagram to help me:

source


Other than that,

Like previously posted if this is causing performance problems declare the container outside of the for loop and just clear it at the start of each iteration



回答4:

In addition to the previous comments :

if you use Robinson's swap method, you could go ever faster by handling that swap asynchronously.



回答5:

Why not something like that :

{
    vector< vector<long long> > vec(n1, vector<long long>(n2));

    for (int i=0; i<bound; i++){

         //about three more for-loops here

         vec.clear();
    }
}

Edit: added scope braces ;-)



回答6:

Well if you are really concerned about performance (and you know the size of n1 and n2 beforehand) but don't want to use a C-style array, std::array may be your friend.

EDIT: Given your edit, it seems an std::array isn't an appropriate substitute since while the vector size does not change each iteration, it still isn't known before compilation.



回答7:

Since you have to reset the vector values to 0 each iteration, in practical terms, this question boils down to "is the cost of allocating and deallocating the memory for the vector cheap or expensive compared to the computations inside the loops".

Assuming the computations are the expensive part of the algorithm, the way you've coded it is both clear, concise, shows the intended scope, and is probably just as fast as alternate approaches.

If however your computations and updates are extremely fast and the allocation/deallocation of the vector is relatively expensive, you could use std::fill to fill zeroes back into the array at the end/beginning of each iteration through the loop.

Of course the only way to know for sure is to measure with a profiler. I suspect you'll find that the approach you took won't show up as a hotspot of any sort and you should leave the obvious code in place.



回答8:

The overhead of using a vector vs an array is minor, especially when you are getting a lot of useful functionality from the vector. Internally a vector allocates an array. So vector is the way to go.



标签: c++ vector