Why does ucontext have such high overhead?

2019-04-16 16:43发布

问题:

The documentation for Boost.Context in Boost v1.59 reports the following performance comparison results:

+----------+----------------------+-------------------+-------------------+----------------+
| Platform |      ucontext_t      |    fcontext_t     | execution_context | windows fibers |
+----------+----------------------+-------------------+-------------------+----------------+
| i386     | 708 ns / 754 cycles  | 37 ns / 37 cycles | ns / cycles       | ns / cycles    |
| x86_64   | 547 ns / 1433 cycles | 8 ns / 23 cycles  | 16 ns / 46 cycles | ns / cycles    |
+----------+----------------------+-------------------+-------------------+----------------+

[link]

I believe the source code for these experiments is hosted on GitHub.

My question is, why is the overhead for ucontext 20x higher than the Boost library's implementation? I can't see any obvious reason why there would be such a big difference. Is the Boost implementation using some low-level trick that the ucontext implementers missed, or is something else happening here?

回答1:

The Boost documentation indicates why Boost.context is faster than the deprecated ucontext_t interfaces. In the Rationale section, you'll find this important note:

Note Context switches do not preserve the signal mask on UNIX systems.

and, in the comparison with makecontext in Other APIs:

ucontext_t preserves signal mask between context switches which involves system calls consuming a lot of CPU cycles.

As indicated, swapcontext does preserve the signal mask, which requires a syscall and all the overhead that entails. Since that was precisely the point of the ucontext_t functions, it cannot be described as an oversight. (If you don't want to preserve the signal mask, you can use setjmp and longjmp.)

By the way, the ucontext_t functions were deprecated in Posix edition 6 and removed in edition 7, because (1) the makecontextinterface requires an obsolescent feature of C, which is not available at all in C++; (2) the interfaces are rarely used; and (3) coroutines can be implemented using Posix threads. (See the note in Posix edition 6.) (Clearly, threads are not an ideal mechanism for implementing coroutines, but neither is an interface which relies on an obsolescent feature.)