I wrote something using atomics rather than locks and perplexed at it being so much slower in my case I wrote the following mini test:
#include <pthread.h>
#include <vector>
struct test
{
test(size_t size) : index_(0), size_(size), vec2_(size)
{
vec_.reserve(size_);
pthread_mutexattr_init(&attrs_);
pthread_mutexattr_setpshared(&attrs_, PTHREAD_PROCESS_PRIVATE);
pthread_mutexattr_settype(&attrs_, PTHREAD_MUTEX_ADAPTIVE_NP);
pthread_mutex_init(&lock_, &attrs_);
}
void lockedPush(int i);
void atomicPush(int* i);
size_t index_;
size_t size_;
std::vector<int> vec_;
std::vector<int> vec2_;
pthread_mutexattr_t attrs_;
pthread_mutex_t lock_;
};
void test::lockedPush(int i)
{
pthread_mutex_lock(&lock_);
vec_.push_back(i);
pthread_mutex_unlock(&lock_);
}
void test::atomicPush(int* i)
{
int ii = (int) (i - &vec2_.front());
size_t index = __sync_fetch_and_add(&index_, 1);
vec2_[index & (size_ - 1)] = ii;
}
int main(int argc, char** argv)
{
const size_t N = 1048576;
test t(N);
// for (int i = 0; i < N; ++i)
// t.lockedPush(i);
for (int i = 0; i < N; ++i)
t.atomicPush(&i);
}
If I uncomment the atomicPush operation and run the test with time(1)
I get output like so:
real 0m0.027s
user 0m0.022s
sys 0m0.005s
and if I run the loop calling the atomic thing (the seemingly unnecessary operation is there because i want my function to look as much as possible as what my bigger code does) I get output like so:
real 0m0.046s
user 0m0.043s
sys 0m0.003s
I'm not sure why this is happening as I would have expected the atomic to be faster than the lock in this case...
When I compile with -O3 I see lock and atomic updates as follows:
lock:
real 0m0.024s
user 0m0.022s
sys 0m0.001s
atomic:
real 0m0.013s
user 0m0.011s
sys 0m0.002s
In my larger app though the performance of the lock (single threaded testing) is still doing better regardless though..