The documentation for Boost.Context in Boost v1.59 reports the following performance comparison results:
+----------+----------------------+-------------------+-------------------+----------------+
| Platform | ucontext_t | fcontext_t | execution_context | windows fibers |
+----------+----------------------+-------------------+-------------------+----------------+
| i386 | 708 ns / 754 cycles | 37 ns / 37 cycles | ns / cycles | ns / cycles |
| x86_64 | 547 ns / 1433 cycles | 8 ns / 23 cycles | 16 ns / 46 cycles | ns / cycles |
+----------+----------------------+-------------------+-------------------+----------------+
[link]
I believe the source code for these experiments is hosted on GitHub.
My question is, why is the overhead for ucontext 20x higher than the Boost library's implementation? I can't see any obvious reason why there would be such a big difference. Is the Boost implementation using some low-level trick that the ucontext implementers missed, or is something else happening here?
The Boost documentation indicates why Boost.context is faster than the deprecated ucontext_t
interfaces. In the Rationale section, you'll find this important note:
Note
Context switches do not preserve the signal mask on UNIX systems.
and, in the comparison with makecontext
in Other APIs:
ucontext_t preserves signal mask between context switches which involves system calls consuming a lot of CPU cycles.
As indicated, swapcontext
does preserve the signal mask, which requires a syscall and all the overhead that entails. Since that was precisely the point of the ucontext_t
functions, it cannot be described as an oversight. (If you don't want to preserve the signal mask, you can use setjmp
and longjmp
.)
By the way, the ucontext_t
functions were deprecated in Posix edition 6 and removed in edition 7, because (1) the makecontext
interface requires an obsolescent feature of C, which is not available at all in C++; (2) the interfaces are rarely used; and (3) coroutines can be implemented using Posix threads. (See the note in Posix edition 6.) (Clearly, threads are not an ideal mechanism for implementing coroutines, but neither is an interface which relies on an obsolescent feature.)