Comprehensive vector vs linked list benchmark for

2019-01-19 22:21发布

问题:

So I am aware of this question, and others on SO that deal with issue, but most of those deal with the complexities of the data structures (just to copy here, linked this theoretically has O(

I understand the complexities would seem to indicate that a list would be better, but I am more concerned with the real world performance.

Note: This question was inspired by slides 45 and 46 of Bjarne Stroustrup's presentation at Going Native 2012 where he talks about how processor caching and locality of reference really help with vectors, but not at all (or enough) with lists.

Question: Is there a good way to test this using CPU time as opposed to wall time, and getting a decent way of "randomly" inserting and deleting elements that can be done beforehand so it does not influence the timings?

As a bonus, it would be nice to be able to apply this to two arbitrary data structures (say vector and hash maps or something like that) to find the "real world performance" on some hardware.

回答1:

I guess if I were going to test something like this, I'd probably start with code something on this order:

#include <list>
#include <vector>
#include <algorithm>
#include <deque>
#include <time.h>
#include <iostream>
#include <iterator>

static const int size = 30000;

template <class T>
double insert(T &container) {
    srand(1234);
    clock_t start = clock();
    for (int i=0; i<size; ++i) {
        int value = rand();
        T::iterator pos = std::lower_bound(container.begin(), container.end(), value);
        container.insert(pos, value);
    }
// uncomment the following to verify correct insertion (in a small container).
//  std::copy(container.begin(), container.end(), std::ostream_iterator<int>(std::cout, "\t"));
    return double(clock()-start)/CLOCKS_PER_SEC;
}


template <class T>
double del(T &container) {
    srand(1234);
    clock_t start = clock();
    for (int i=0; i<size/2; ++i) {
        int value = rand();
        T::iterator pos = std::lower_bound(container.begin(), container.end(), value);
        container.erase(pos);
    }
    return double(clock()-start)/CLOCKS_PER_SEC;
}       

int main() { 
    std::list<int> l;
    std::vector<int> v;
    std::deque<int> d;

    std::cout << "Insertion time for list: " << insert(l) << "\n";
    std::cout << "Insertion time for vector: " << insert(v) << "\n";
    std::cout << "Insertion time for deque: " << insert(d) << "\n\n";

    std::cout << "Deletion time for list: " << del(l) << '\n';
    std::cout << "Deletion time for vector: " << del(v) << '\n';
    std::cout << "Deletion time for deque: " << del(d) << '\n';

    return 0;
}

Since it uses clock, this should give processor time not wall time (though some compilers such as MS VC++ get that wrong). It doesn't try to measure the time for insertion exclusive of time to find the insertion point, since 1) that would take a bit more work and 2) I still can't figure out what it would accomplish. It's certainly not 100% rigorous, but given the disparity I see from it, I'd be a bit surprised to see a significant difference from more careful testing. For example, with MS VC++, I get:

Insertion time for list: 6.598
Insertion time for vector: 1.377
Insertion time for deque: 1.484

Deletion time for list: 6.348
Deletion time for vector: 0.114
Deletion time for deque: 0.82

With gcc I get:

Insertion time for list: 5.272
Insertion time for vector: 0.125
Insertion time for deque: 0.125

Deletion time for list: 4.259
Deletion time for vector: 0.109
Deletion time for deque: 0.109

Factoring out the search time would be somewhat non-trivial because you'd have to time each iteration separately. You'd need something more precise than clock (usually is) to produce meaningful results from that (more on the order or reading a clock cycle register). Feel free to modify for that if you see fit -- as I mentioned above, I lack motivation because I can't see how it's a sensible thing to do.



回答2:

This is the program I wrote after watching that talk. I tried running each timing test in a separate process to make sure the allocators weren't doing anything sneaky to alter performance. I have amended the test allow timing of the random number generation. If you are concerned it is affecting the results significantly, you can time it and subtract out the time spent there from the rest of the timings. But I get zero time spent there for anything but very large N. I used getrusage() which I am pretty sure isn't portable to Windows but it would be easy to substitute in something using clock() or whatever you like.

#include <assert.h>
#include <algorithm>
#include <iostream>
#include <list>
#include <vector>
#include <stdlib.h>
#include <time.h>
#include <sys/time.h>
#include <sys/resource.h>


void f(size_t const N)
{
    std::vector<int> c;
    //c.reserve(N);
    for (size_t i = 0; i < N; ++i) {
        int r = rand();
        auto p = std::find_if(c.begin(), c.end(), [=](int a) { return a >= r; });
        c.insert(p, r);
    }
}

void g(size_t const N)
{
    std::list<int> c;
    for (size_t i = 0; i < N; ++i) {
        int r = rand();
        auto p = std::find_if(c.begin(), c.end(), [=](int a) { return a >= r; });
        c.insert(p, r);
    }
}

int h(size_t const N)
{
    int r;
    for (size_t i = 0; i < N; ++i) {
        r = rand();
    }
    return r;
}

double usage()
{
    struct rusage u;
    if (getrusage(RUSAGE_SELF, &u) == -1) std::abort();
    return
        double(u.ru_utime.tv_sec) + (u.ru_utime.tv_usec / 1e6) +
        double(u.ru_stime.tv_sec) + (u.ru_stime.tv_usec / 1e6);
}


int
main(int argc, char* argv[])
{
    assert(argc >= 3);
    std::string const sel = argv[1];
    size_t const N = atoi(argv[2]);

    double t0, t1;
    srand(127);

    if (sel == "vector") {
        t0 = usage();
        f(N);
        t1 = usage();
    } else if (sel == "list") {
        t0 = usage();
        g(N);
        t1 = usage();
    } else if (sel == "rand") {
        t0 = usage();
        h(N);
        t1 = usage();
    } else {
        std::abort();
    }

    std::cout
        << (t1 - t0)
        << std::endl;

    return 0;
}

To get a set of results I used the following shell script.

seq=`perl -e 'for ($i = 10; $i < 100000; $i *= 1.1) { print int($i), " "; }'`
for i in $seq; do
    vt=`./a.out vector $i`
    lt=`./a.out list $i`
    echo $i $vt $lt
done