C++11 vector performance issue (with code ex

2020-02-11 17:27发布

I notice that vector is much slower than bool array when running the following code.

int main() 
{
    int count = 0;
    int n = 1500000;
    // slower with c++ vector<bool>
    /*vector<bool> isPrime;
    isPrime.reserve(n);
    isPrime.assign(n, true);
    */
    // faster with bool array 
    bool* isPrime = new bool[n];

    for (int i = 0; i < n; ++i)
        isPrime[i] = true;


    for (int i = 2; i< n; ++i) {
        if (isPrime[i])
            count++;
        for (int j =2; i*j < n; ++j )
            isPrime[i*j] = false;
    }

    cout <<  count << endl;
    return 0;
}

Is there some way that I can do to make vector<bool> faster ? Btw, both std::vector::push_back and std::vector::emplace_back are even slower than std::vector::assign.

3条回答
爱情/是我丢掉的垃圾
2楼-- · 2020-02-11 18:09

vector<bool> can be high performance, but isn't required to be. For vector<bool> to be efficient, it needs to operate on many bools at a time (e.g. isPrime.assign(n, true)), and the implementor has had to put loving care into it. Indexing individual bools in a vector<bool> is slow.

Here is a prime finder that I wrote a while back using vector<bool> and clang + libc++ (the libc++ part is important):

#include <algorithm>
#include <chrono>
#include <iostream>
#include <vector>

std::vector<bool>
init_primes()
{
    std::vector<bool> primes(0x80000000, true);
    primes[0] = false;
    primes[1] = false;
    const auto pb = primes.begin();
    const auto pe = primes.end();
    const auto sz = primes.size();
    size_t i = 2;
    while (true)
    {
        size_t j = i*i;
        if (j >= sz)
            break;
        do
        {
            primes[j] = false;
            j += i;
        } while (j < sz);
        i = std::find(pb + (i+1), pe, true) - pb;
    }
    return primes;
}

int
main()
{
    using namespace std::chrono;
    using dsec = duration<double>;
    auto t0 = steady_clock::now();
    auto p = init_primes();
    auto t1 = steady_clock::now();
    std::cout << dsec(t1-t0).count() << "\n";
}

This executes for me in about 28s (-O3). When I change it to return a vector<char> instead, the execution time goes up to about 44s.

If you run this using some other std::lib, you probably won't see this trend. On libc++ algorithms such as std::find have been optimized to search a word of bits at a time, instead of bit at a time.

See http://howardhinnant.github.io/onvectorbool.html for more details on what std algorithms could be optimized by your vendor.

查看更多
仙女界的扛把子
3楼-- · 2020-02-11 18:17

vector<bool> may have a template specialization and may be implemented using bit array to save space. Extracting and saving a bit and converting it from / to bool may cause the performance drop you are observing. If you use std::vector::push_back, you are resizing the vector which will cause even worse performance. Next performance killer may be assign (Worst complexity: Linear of first argument), instead use operator [] (Complexity: constant).

On the other hand, bool [] is guaranteed to be array of bool.

And you should resize to n instead of n-1 to avoid undefined behaviour.

查看更多
Root(大扎)
4楼-- · 2020-02-11 18:20

std::vector<bool> can have various performance issues (e.g. take a look at https://isocpp.org/blog/2012/11/on-vectorbool).

In general you can:

  • use std::vector<std::uint8_t> instead of std::vector<bool> (give a try to std::valarray<bool> also).

    This requires more memory and is less cache-friendly but there isn't a overhead (in the form of bit manipulation) to access a single value, so there are situations in which it works better (after all it's just like your array of bool but without the nuisance of memory management)

  • use std::bitset if you know at compile time how large your boolean array is going to be (or if you can at least establish a reasonable upper bound)
  • if Boost is an option try boost::dynamic_bitset (the size can be specified at runtime)

But for speed optimizations you have to test...

With your specific example I can confirm a performance difference only when optimizations are turned off (of course this isn't the way to go).

Some tests with g++ v4.8.3 and clang++ v3.4.5 on an Intel Xeon system (-O3 optimization level) give a different picture:

                    time (ms)
                 G++      CLANG++
array of bool    3103     3010
vector<bool>     2835     2420    // not bad!
vector<char>     3136     3031    // same as array of bool
bitset           2742     2388    // marginally better

(time elapsed for 100 runs of the code in the answer)

std::vector<bool> doesn't look so bad (source code here).

查看更多
登录 后发表回答