Problem
I am creating a Windows 7 based C# WPF application using .Net 4.5, and one its major features is to call certain functions that interface with custom hardware with a set of user defined cycle times. For example the user might choose two functions to be called every 10 or 20 milliseconds and another every 500 milliseconds. The smallest cycle time the user can choose is 1 milliseconds.
At first it seemed that the timings were accurate and the functions were called every 1 millisecond as required. But we later noticed that about 1-2% of the timings were not accurate, were some functions were called just 5 milliseconds late, and others could reach up to 100 milliseconds late. Even with cycle times greater than 1 msec, we faced the problem that the thread slept at the time it should have called the external function (a 20 msecs function could be called 50 msecs late because the thread was sleeping and didnt call the function)
After analysis we concluded that these delays were sporadic, with no noticeable pattern, and that the main possible reason behind these delays were OS scheduling and thread context switching, in other words our thread wasn't awake all the time like we need it to be.
As windows 7 is not an RTOS, we need to find if we can work around this problem somehow. But we do know for sure that this problem is fixable on windows, as we use other tools with similar functionality that can meet those timing constraints with a maximum of 0.7 ms error tolerance.
Our application is multithreaded with about a maximum of 30 threads running at the same, its current peak CPU usage is about 13%
Attempted Solutions
We tried many different things, timing was mainly measured using the stopwatch timer and IsHighResolution was true (other timers were used but we did not notice much difference):
Creating a separate thread and giving it high priority
Result: Ineffective (using both the terribleThread.Sleep()
, and without it and using continuous polling)Using a C# task (Thread pool)
Result: very little improvementUsing a multimedia timer with 1ms periodicity
Result: Ineffective or worse, multimedia timers are accurate at waking up the OS, but the OS may choose to run another thread, no 1ms guarantee, but even then, delays could be much bigger occasionallyCreated a separate standalone C# project that just contained a while loop and stopwatch timer
Result: most of the time the accuracy was great even in microseconds, but occasionally the thread sleepsRepeated point 4, but set the process priority to Realtime/High
Result: Very good numbers, almost not a single message had significant delay.
Conclusion:
From the previous we found that we had 5 possible courses of action, but we need someone knowledgeable with experience in such problems to point us in the right direction:
Our tool can be optimized and the threads managed somehow to insure the 1ms realtime requirement. maybe part of the optimization is setting the process priority of the tool to high or Realtime, but that does not seem like a wise decisions, as users could be using several other tools at the same time.
We divide our tool into two processes, one that contains the GUI And all the non time critical operations, and the other containing the minimal amount of time critical operations and set it to high/real time priority, and use IPC (like WCF) to communication between the processes. This could benefit us in two ways
Less probability of starvation for other processes as much less operations are happening.
The process would have less threads so (much less or no) probability of thread sleeping
Note: The next two points will deal with kernel space, please note that I have little information about kernel space and writing drivers, so I might be making some wrong assumptions about how it could be used.
Creating a driver in kernel space that uses lower level interrupts every 1ms to fire an event that forces the thread to perform its designated task in the process.
Moving the time critical components to kernel space, any interfacing with the Main body of the programs could be done through APIs and callbacks.
Perhaps all of these are not valid, and we might need to use a windows RTOS extension like IntervalZero RTOS platform?
The Question Itself
There are two answers I am looking for, and I hope they are backed with good sources.
Is this truly a threading and context switching problem? Or have we been missing something all of this time?
Which of the 5 options is guaranteed to fix this problem, and if several are, which is the easiest? If none of these options can fix it, what can? Please remember that other tools we have bench-marked do indeed reach the required timing accuracy on windows, and when the CPU is under heavy load, one or two timings out of 100,000 could be off by less than 2 milliseconds, which is very acceptable.
Which of the 5 options is guaranteed to fix this problem?
This depends on what accuracy your are trying to achieve. If you're aiming for say +/- 1ms, you have a reasonable chance to get it done without points 3) to 5). The combination of points 1) and 2) is the way to go:
THREAD_PRIORITY_HIGHEST(2)
as the maximimum priority. Therefore you'd have to look into the SetThreadPriority function which allows access toTHREAD_PRIORITY_TIME_CRITICAL (15)
. The Process::PriorityClass Property allows to accessREALTIME_PRIORITY_CLASS (24)
. Note: Code running at such priorities will push all other code out of the way. You'd have to make the code with very littly computation and very safe.General remarks: All depends on load. Windows can do pretty well despite the fact that it is not a "realtime OS". However, also realtime systems rely on low load. Nothing is guaranteed, not even on an RT-OS when it is heavily loaded.
I suspect nothing you do, in user-mode, to a thread's priority or affinity will guarantee the behavior you seek, so I think you may need something like your options 3 or 4, which means writing a kernel-mode driver.
In kernel-mode, there is the notion of IRQL, where code triggered to run at higher levels preempts code running at lower levels. User-mode code runs at IRQL 0, so all kernel-mode code at any higher level takes precedence. The thread scheduler itself runs at an elevated level, 2 I believe (which is called DISPATCH_LEVEL), so it can preempt any scheduled user-mode code of any priority, including, I believe, REALTIME_PRIORITY_CLASS. Hardware interrupts including timers run even higher.
A hardware timer will invoke its interrupt handler about as accurately as the timer resolution, if there's a CPU/core available at a lower IRQL (higher-level interrupt handlers not executing).
If there is much work to do, one shouldn't do it in the interrupt handler (IRQL > DISPATCH_LEVEL), but use the interrupt handler to schedule the larger body of work to run "soon" at DISPATCH_LEVEL using a Deferred Procedure Call (DPC), which still prevents the thread scheduler from interfering, but doesn't prevent other interrupt handlers from handling their hardware interrupts.
A likely problem with your option 3 is that firing an event to wake a thread to run user-mode code at IRQL 0 is that it again allows the thread scheduler to decide when the user-mode code will execute. You may need to do your time-sensitive work in kernel-mode at DISPATCH_LEVEL.
Another issue is that interrupts fire without regard to the process context the CPU core was running. So when the timer fires, the handler likely runs in the context of a process unrelated to yours. So you may need to do your time-sensitive work in a kernel-mode driver, using kernel-space memory, independent of your process, and then feed any results back to your app later, when it resumes running and can interact with the driver. (Apps can interact with drivers by passing buffers down via the DeviceIoControl API.)
I am not suggesting you implement a hardware timer interrupt handler; the OS already does that. Rather, use the kernel timer services to invoke your code based on the OS handling of the timer interrupt. See KeSetTimer and ExSetTimer. Both of these can call back to your code at DISPATCH_LEVEL after the timer fires.
And (even in kernel-mode) the system timer resolution may, by default, be too coarse for your 1 ms requirement.
https://msdn.microsoft.com/en-us/library/windows/hardware/dn265247(v=vs.85).aspx
For higher resolution, you may