How to make a fast context switch from one process

I need to run unsafe native code on a sandbox process and I need to reduce bottleneck of process switch. Both processes (controller and sandbox) shares two auto-reset events and a coherent view of a mapped file (shared memory) that is used for communication.

To make this article smaller, I removed initializations from sample code, but the events are created by the controller, duplicated using DuplicateHandle, and then sent to sandbox process prior to work.

Controller source:

void inSandbox(HANDLE hNewRequest, HANDLE hAnswer, volatile int *shared) {
  int before = *shared;
  for (int i = 0; i < 100000; ++i) {
    // Notify sandbox of a new request and wait for answer.
    SignalObjectAndWait(hNewRequest, hAnswer, INFINITE, FALSE);
  }
  assert(*shared == before + 100000);
}

void inProcess(volatile int *shared) {
  int before = *shared;
  for (int i = 0; i < 100000; ++i) {
    newRequest(shared);
  }
  assert(*shared == before + 100000);
}

void newRequest(volatile int *shared) {
  // In this test, the request only increments an int.
  (*shared)++;
}

Sandbox source:

void sandboxLoop(HANDLE hNewRequest, HANDLE hAnswer, volatile int *shared) {
  // Wait for the first request from controller.
  assert(WaitForSingleObject(hNewRequest, INFINITE) == WAIT_OBJECT_0);
  for(;;) {
    // Perform request.
    newRequest(shared);
    // Notify controller and wait for next request.
    SignalObjectAndWait(hAnswer, hNewRequest, INFINITE, FALSE);
  }
}

void newRequest(volatile int *shared) {
  // In this test, the request only increments an int.
  (*shared)++;
}

Measurements:

inSandbox() - 550ms, ~350k context switches, 42% CPU (25% kernel, 17% user).
inProcess() - 20ms, ~2k context switches, 55% CPU (2% kernel, 53% user).

The machine is Windows 7 Pro, Core 2 Duo P9700 with 8gb of memory.

An interesting fact is that sandbox solution uses 42% of CPU vs 55% of in-process solution. Another noteworthy fact is that sandbox solution contains 350k context switches, which is much more than the 200k context switches that we can infer from source code.

I need to know if there's a way to reduce the overhead of transfer control to another process. I already tried to use pipes instead of events, and it was much worse. I also tried to use no event at all, by making the sandbox call SuspendThread(GetCurrentThread()) and making the controller call ResumeThread(hSandboxThread) on every request, but the performance was similar to using events.

If you have a solution that uses assembly (like performing a manual context switch) or Windows Driver Kit, please let me know as well. I don't mind having to install a driver to make this faster.

I heard that Google Native Client does something similar, but I only found this documentation. If you have more information, please let me know.

The first thing to try is raising the priority of the waiting thread. This should reduce the number of extraneous context switches.

Alternatively, since you're on a 2-core system, using spinlocks instead of events would make your code much much faster, at the cost of system performance and power consumption:

void inSandbox(volatile int *lock, volatile int *shared) 
{
  int i, before = *shared;
  for (i = 0; i < 100000; ++i) {
    *lock = 1;
    while (*lock != 0) { }
  }
  assert(*shared == before + 100000);
}

void newRequest(volatile int *shared) {
  // In this test, the request only increments an int.
  (*shared)++;
}

void sandboxLoop(volatile int *lock, volatile int * shared)
{
  for(;;) {
    while (*lock != 1) { }
    newRequest(shared);
    *lock = 0;
  }
}

In this scenario, you should probably set thread affinity masks and/or lower the priority of the spinning thread so that it doesn't compete with the busy thread for CPU time.

Ideally, you'd use a hybrid approach. When one side is going to be busy for a while, let the other side wait on an event so that other processes can get some CPU time. You could trigger the event a little ahead of time (using the spinlock to retain synchronization) so that the other thread will be ready when you are.