I'm trying to speed up a program by using std::async. Let's say I have a function
T* f (const T& t1, const T& t2, const T& t3)
Where T is a type that is expensive to copy. I have several independent calls of f with different arguments and I try to parallelize them with std::async approximately like this: (where m_futures is a std::vector of futures of the correct type).
for (...) {
m_futures.push_back (
std::async(
std::launch::async,
f,
a,b,c));
}
I observed that the above code slows down the execution of my program. I stepped through it with gdb and when the future is created, the copy constructor of T is called three times. Why is that? The arguments a,b,c are heap allocated, but maybe the compiler does not know about it? Can I make it explicit somehow?
Is it always the case that std::async creates copies of the arguments, even if they should be passed by const reference? Can I avoid this somehow? In my naive mind, there should just be a pointer passed around to the different invocations of the function (which only reads from the memory anyway.) I'm using gcc-4.6.3 on Linux if that matters.