Slow communication using shared memory between use

2019-08-19 10:08发布

问题:

I am running a thread in the Windows kernel communicating with an application over shared memory. Everything is working fine except the communication is slow due to a Sleep loop. I have been investigating spin locks, mutexes and interlocked but can't really figure this one out. I have also considered Windows events but don't know about the performance of that one. Please advice on what would be a faster solution keeping the communication over shared memory possibly suggesting Windows events.

KERNEL CODE

typedef struct _SHARED_MEMORY
{
    BOOLEAN mutex;
    CHAR data[BUFFER_SIZE];
} SHARED_MEMORY, *PSHARED_MEMORY;

ZwCreateSection(...)
ZwMapViewOfSection(...)

while (TRUE) {
    if (((PSHARED_MEMORY)SharedSection)->mutex == TRUE) {
      //... do work...
      ((PSHARED_MEMORY)SharedSection)->mutex = FALSE;
    }
    KeDelayExecutionThread(KernelMode, FALSE, &PollingInterval);
}

APPLICATION CODE

OpenFileMapping(...)
MapViewOfFile(...)

...

RtlCopyMemory(&SM->data, WriteData, Size);
SM->mutex = TRUE;

while (SM->mutex != FALSE) {
    Sleep(1); // Slow and removing it will cause an infinite loop
}

RtlCopyMemory(ReadData, &SM->data, Size);

UPDATE 1 Currently this is the fastest solution I have come up with:

while(InterlockedCompareExchange(&SM->mutex, FALSE, FALSE));

However I find it funny that you need to do an exchange and that there is no function for only compare.

回答1:

You don't want to use InterlockedCompareExchange. It burns the CPU, saturates core resources that might be needed by another thread sharing that physical core, and can saturate inter-core buses.

You do need to do two things:

1) Write an InterlockedGet function and use it.

2) Prevent the loop from burning CPU resources and from taking the mother of all mispredicted branches when it finally gets unblocked.

For 1, this is known to work on all compilers that support InterlockedCompareExchange, at least last time I checked:

__inline static int InterlockedGet(int *val)
{
    return *((volatile int *)val);
}

For 2, put this as the body of the wait loop:

__asm
{
    rep nop
}

For x86 CPUs, this is specified to solve the resource saturation and branch prediction problems.

Putting it together:

while ((*(volatile int *) &SM->mutex) != FALSE) {
    __asm
    {
        rep nop
    }
}

Change int as needed if it's not appropriate.