What would be the fastest portable bi-directional communication mechanism for inter-process communication where threads from one application need to communicate to multiple threads in another application on the same computer, and the communicating threads can be on different physical CPUs).
I assume that it would involve a shared memory and a circular buffer and shared synchronization mechanisms.
But shared mutexes are very expensive (and there are limited number of them too) to synchronize when threads are running on different physical CPUs.