Performance gap between sorting a list and a vecto

2019-04-05 23:36发布

问题:

I wrote a simple C++ code to check the speed of sorting data , represented in the form of a list and then a vector.

In the case of the list I am getting time as 27 seconds. For a vector I get 10 seconds. Why the huge performance gap? Aren't the algorithms used for sorting the list and the vector the same? viz. mergesort?

EDIT: I may be wrong on the last point. As I know, textbooks when descirbing sorting algorithms theoretically, seem to be use the word list in the sense of a std::vector. I don't know how how sorting algorithms for vectors would be different from sorting algorithms for lists, so if some one could clarify that would be really helpful. Thank you.

 //In this code we compare the sorting times for lists and vectors.
//Both contain a sequence of structs

#include <iostream>
#include <vector>
#include <list>
#include <algorithm>
#include <time.h>
#include <math.h>
#include <stdlib.h>
#include <iomanip>
using namespace std;


struct particle
{
  double x;
  double y;
  double z;
  double w;

    bool operator<(const particle& a) const
    {
        return x < a.x;
    }

};


int main(int argc, char *argv[])
{
  int N=20000000;
  clock_t start,stop;

  vector<particle> myvec(N);
  vector<particle>::iterator cii;
  //Set vector values
  for (cii = myvec.begin(); cii != myvec.end(); ++cii)
  {
    cii->x =1.0*rand()/RAND_MAX;
    cii->y =1.0*rand()/RAND_MAX;
    cii->z =1.0*rand()/RAND_MAX;
    cii->w =1.0*rand()/RAND_MAX;
 }


  list<particle> mylist(N);
  list<particle>::iterator dii;

   //Set list values
  for (cii=myvec.begin(),dii = mylist.begin(); dii != mylist.end() && cii!=myvec.end(); ++dii, ++cii)
  {
      dii->x =cii->x;
      dii->y =cii->y;
          dii->z =cii->z;
      dii->w =cii->w;
 }


  //Sort the vector 

  start=clock();
  sort(myvec.begin(),myvec.end());
  stop=clock();
  cout<<"Time for sorting vector "<<(stop-start)/(double) CLOCKS_PER_SEC<<endl;



  //Sort the list
  start=clock();
  mylist.sort();
  stop=clock();
  cout<<"Time for sorting list "<<(stop-start)/(double) CLOCKS_PER_SEC<<endl;



  return 0;
}

回答1:

No a std::vector is not sorted using merge sort (in most implementations; the standard doesn't specify the algorithm).

std::list does not have O(1) random access, so it cannot use algorithms like Quick sort* which requires O(1) random access to be fast (this is also why std::sort doesn't work on std::list.)

With this, you'll have to use algorithms that forward iteration is enough, such as the Merge sort**.

And merge sort is typically slower [1][2].

See also: what's the difference between list.sort and std::sort?

*: libstdc++ actually uses introsort.
**: libstdc++ actually uses a variant of merge sort



回答2:

I'm really not a C++ programmer, but my understanding is that std::vector has different performance characteristics from std::list. Specifically (as @Martinho commented):

std::vector has O(1) random access, while std::list doesn't.


From cplusplus.com (I'm sure there are less sketchy references out there, feel free to chime in):

Vectors are good at:

  • Accessing individual elements by their position index (constant time).
  • Iterating over the elements in any order (linear time).
  • Add and remove elements from its end (constant amortized time).

and:

...Advantages to list containers:

  • Efficient insertion and removal of elements anywhere in the container (constant time).
  • Efficient moving elements and block of elements within the container or even between different containers (constant time).
  • Iterating over the elements in forward or reverse order (linear time).


回答3:

A vector packs things closer in memory than a list does. This results in a more cache-friendly access pattern during sorting.



回答4:

list::sort and std::sort on vectors don't use the same algorithm.

std::sort uses a sorting algorithm that requires random-access iterators, such as the ones required by std::vector, but not by std::list.

list::sort is specialized for lists; it usually implements a merge sort, which does not require random access.

The total number of comparisons is O(n log n) for both algorithms (I say that without knowing the exact algorithm used by my compiler's std::sort implementation). The total number of swaps is O(n log n) as well, but for std::sort, that means O(n log n) calls to copy constructor/assignment operator, whereas for list::sort, it's a pointer operation. Your structure is way too small for this advantage to pay off. I assume that as soon as you put something with a non-trivial copy constructor into the struct (maybe a std::string is enough), std::list will win.

EDIT: One std::string member initialised with a random double converted to text seems to be about the break-even point on my machine (x86_64-linux, gcc 4.6.2)



回答5:

Vectors will allow constant time element swapping as well as constant time random access. Lists take linear time to random access while having (probably) a touch more overhead for the swap with pointer updates. My guess is the sort is doing a bunch of swaps. Also, vectors are more efficient at moving large parts of themselves around in memory.

I'd be curious if swapping an slist<> would go faster than a list due to the slightly less pointer overhead.



标签: c++ stl vector