When I read asio source code, I am curious about how asio making data synchronized between threads even a implicit strand was made. These are code in asio:
io_service::run
mutex::scoped_lock lock(mutex_);
std::size_t n = 0;
for (; do_run_one(lock, this_thread, ec); lock.lock())
if (n != (std::numeric_limits<std::size_t>::max)())
++n;
return n;
io_service::do_run_one
while (!stopped_)
{
if (!op_queue_.empty())
{
// Prepare to execute first handler from queue.
operation* o = op_queue_.front();
op_queue_.pop();
bool more_handlers = (!op_queue_.empty());
if (o == &task_operation_)
{
task_interrupted_ = more_handlers;
if (more_handlers && !one_thread_)
{
if (!wake_one_idle_thread_and_unlock(lock))
lock.unlock();
}
else
lock.unlock();
task_cleanup on_exit = { this, &lock, &this_thread };
(void)on_exit;
// Run the task. May throw an exception. Only block if the operation
// queue is empty and we're not polling, otherwise we want to return
// as soon as possible.
task_->run(!more_handlers, this_thread.private_op_queue);
}
else
{
std::size_t task_result = o->task_result_;
if (more_handlers && !one_thread_)
wake_one_thread_and_unlock(lock);
else
lock.unlock();
// Ensure the count of outstanding work is decremented on block exit.
work_cleanup on_exit = { this, &lock, &this_thread };
(void)on_exit;
// Complete the operation. May throw an exception. Deletes the object.
o->complete(*this, ec, task_result);
return 1;
}
}
in its do_run_one, the unlock of mutex are all before execute handler. If there is a implicit strand, handler will not executed concurrent, but the problem is: thread A run a handler which modify data, and thread B run next handler which read the data which had been modified by thread A. Without the protect of mutex, how thread B seen the changes of data made by thread A? The mutex unlocking ahead of handler execution doesn't make a happen-before relationship between threads access the data which handler accessed. When I go further, the handler execution use a thing called fenced_block:
completion_handler* h(static_cast<completion_handler*>(base));
ptr p = { boost::addressof(h->handler_), h, h };
BOOST_ASIO_HANDLER_COMPLETION((h));
// Make a copy of the handler so that the memory can be deallocated before
// the upcall is made. Even if we're not about to make an upcall, a
// sub-object of the handler may be the true owner of the memory associated
// with the handler. Consequently, a local copy of the handler is required
// to ensure that any owning sub-object remains valid until after we have
// deallocated the memory here.
Handler handler(BOOST_ASIO_MOVE_CAST(Handler)(h->handler_));
p.h = boost::addressof(handler);
p.reset();
// Make the upcall if required.
if (owner)
{
fenced_block b(fenced_block::half);
BOOST_ASIO_HANDLER_INVOCATION_BEGIN(());
boost_asio_handler_invoke_helpers::invoke(handler, handler);
BOOST_ASIO_HANDLER_INVOCATION_END;
}
what is this? I know fence seems a sync primitive which supported by C++11 but this fence is totally writen by asio itself. Does this fenced_block help to do the job of data synchronization?
UPDATED
After I google and read this and this, asio indeed use memory fence primitive to synchronize data in threads, that is more faster than unlock till the handler execute complete(speed difference on x86). In fact Java volatile keyword is implemented by insert memory barrier after write & before read this variable to make happen-before relationship.
If someone could simply describe asio memory fence implemenation or add something I missed or misunderstand, I will accept it.
Before the operation invokes the user handler, Boost.Asio uses a memory fence to provide the appropriate memory reordering without forcing mutual execution of handler execution. Thus, thread B would observe changes to memory that occurred within the context of thread A.
C++03 did not specify requirements for memory visibility with regards to multi-threaded execution. However, C++11 defines these requirements in § 1.10 Multi-threaded executions and data races, as well as the Atomic operations and Thread support library sections. Boost and C++11 mutexes do perform the appropriate memory reordering. For other implementations, it is worth checking the mutex library's documentation to verify memory reordering occurs.
Boost.Asio memory fences are an implementation detail, and thus always subject to change. Boost.Asio abstracts itself from the architecture/compiler specific implementations through a series of conditional defines within asio/detail/fenced_block.hpp where only a single memory barrier implementation is included. The underlying implementation is contained within a class, for which a
fenced_block
alias is created via a typedef.Here is a relevant excerpt:
The implementations of the the memory barriers are specific to the architecture and compilers. Boost.Asio has a family of asio/detail/*_fenced_blocked.hpp header files. For example, the
win_fenced_block
usesInterlockedExchange
for Borland; otherwise it uses thexchg
assembly instruction, which has an implicit lock prefix when used with a memory address. Forgcc_x86_fenced_block
, Boost.Asio uses thememory
assembly instruction.If you find yourself needing to use a fence, then consider the Boost.Atomic library. Introduced in Boost 1.53, Boost.Atomic provides an implementation of thread and signal fences based the C++11 standard. Boost.Asio has been using its own implementation of memory fences prior to the Boost.Atomic being added to Boost. Also, the Boost.Asio fences are scoped based.
fenced_block
will perform an acquire in its constructor, and a release in its destructor.