Achieving realtime 1 millisecond accurate events w

2020-06-23 06:59发布

Problem

I am creating a Windows 7 based C# WPF application using .Net 4.5, and one its major features is to call certain functions that interface with custom hardware with a set of user defined cycle times. For example the user might choose two functions to be called every 10 or 20 milliseconds and another every 500 milliseconds. The smallest cycle time the user can choose is 1 milliseconds.

At first it seemed that the timings were accurate and the functions were called every 1 millisecond as required. But we later noticed that about 1-2% of the timings were not accurate, were some functions were called just 5 milliseconds late, and others could reach up to 100 milliseconds late. Even with cycle times greater than 1 msec, we faced the problem that the thread slept at the time it should have called the external function (a 20 msecs function could be called 50 msecs late because the thread was sleeping and didnt call the function)

After analysis we concluded that these delays were sporadic, with no noticeable pattern, and that the main possible reason behind these delays were OS scheduling and thread context switching, in other words our thread wasn't awake all the time like we need it to be.

As windows 7 is not an RTOS, we need to find if we can work around this problem somehow. But we do know for sure that this problem is fixable on windows, as we use other tools with similar functionality that can meet those timing constraints with a maximum of 0.7 ms error tolerance.

Our application is multithreaded with about a maximum of 30 threads running at the same, its current peak CPU usage is about 13%

Attempted Solutions

We tried many different things, timing was mainly measured using the stopwatch timer and IsHighResolution was true (other timers were used but we did not notice much difference):

  1. Creating a separate thread and giving it high priority
    Result: Ineffective (using both the terrible Thread.Sleep(), and without it and using continuous polling)

  2. Using a C# task (Thread pool)
    Result: very little improvement

  3. Using a multimedia timer with 1ms periodicity
    Result: Ineffective or worse, multimedia timers are accurate at waking up the OS, but the OS may choose to run another thread, no 1ms guarantee, but even then, delays could be much bigger occasionally

  4. Created a separate standalone C# project that just contained a while loop and stopwatch timer
    Result: most of the time the accuracy was great even in microseconds, but occasionally the thread sleeps

  5. Repeated point 4, but set the process priority to Realtime/High
    Result: Very good numbers, almost not a single message had significant delay.

Conclusion:

From the previous we found that we had 5 possible courses of action, but we need someone knowledgeable with experience in such problems to point us in the right direction:

  1. Our tool can be optimized and the threads managed somehow to insure the 1ms realtime requirement. maybe part of the optimization is setting the process priority of the tool to high or Realtime, but that does not seem like a wise decisions, as users could be using several other tools at the same time.

  2. We divide our tool into two processes, one that contains the GUI And all the non time critical operations, and the other containing the minimal amount of time critical operations and set it to high/real time priority, and use IPC (like WCF) to communication between the processes. This could benefit us in two ways

    1. Less probability of starvation for other processes as much less operations are happening.

    2. The process would have less threads so (much less or no) probability of thread sleeping

Note: The next two points will deal with kernel space, please note that I have little information about kernel space and writing drivers, so I might be making some wrong assumptions about how it could be used.

  1. Creating a driver in kernel space that uses lower level interrupts every 1ms to fire an event that forces the thread to perform its designated task in the process.

  2. Moving the time critical components to kernel space, any interfacing with the Main body of the programs could be done through APIs and callbacks.

  3. Perhaps all of these are not valid, and we might need to use a windows RTOS extension like IntervalZero RTOS platform?

The Question Itself

There are two answers I am looking for, and I hope they are backed with good sources.

  1. Is this truly a threading and context switching problem? Or have we been missing something all of this time?

  2. Which of the 5 options is guaranteed to fix this problem, and if several are, which is the easiest? If none of these options can fix it, what can? Please remember that other tools we have bench-marked do indeed reach the required timing accuracy on windows, and when the CPU is under heavy load, one or two timings out of 100,000 could be off by less than 2 milliseconds, which is very acceptable.

2条回答
冷血范
2楼-- · 2020-06-23 07:27

Which of the 5 options is guaranteed to fix this problem?

This depends on what accuracy your are trying to achieve. If you're aiming for say +/- 1ms, you have a reasonable chance to get it done without points 3) to 5). The combination of points 1) and 2) is the way to go:

  • Split your code into time critical parts and less time critical parts (GUI etc.) and put them into separate processes. Let them comunicate by means of decent IPC (pipes, shared memory, and alike).
  • Raise the process priority class and the thread priority of the time critical process. Unfortunately, the c# ThreadPriority Enumeration only permits THREAD_PRIORITY_HIGHEST(2) as the maximimum priority. Therefore you'd have to look into the SetThreadPriority function which allows access to THREAD_PRIORITY_TIME_CRITICAL (15). The Process::PriorityClass Property allows to access REALTIME_PRIORITY_CLASS (24). Note: Code running at such priorities will push all other code out of the way. You'd have to make the code with very littly computation and very safe.
  • Use the ProcessThread::ProcessorAffinity property to adjust proper core usage. Hint: you may want to keep your time critical threads away from CPU_0 (property value 0x0001) because the Windows kernel prefers this CPU for specific operations. Example: On a platform with 4 logical processors you'd specifiy the ProcessoreAffinity property with 0x000E to exclude CPU_0.
  • The systems timer resolution is often set by other applications. Therefore, it is only predictable when you dictate the systems timer resolution. Some applications/drivers even set the timer resolution to 0.5ms. This may be beyond youre setting and can lead to hiccups in you application. See this SO answer on how to set the timer resolution to 0.5ms. (Note: The support of this resolution is platform dependent.)

General remarks: All depends on load. Windows can do pretty well despite the fact that it is not a "realtime OS". However, also realtime systems rely on low load. Nothing is guaranteed, not even on an RT-OS when it is heavily loaded.

查看更多
Evening l夕情丶
3楼-- · 2020-06-23 07:41

I suspect nothing you do, in user-mode, to a thread's priority or affinity will guarantee the behavior you seek, so I think you may need something like your options 3 or 4, which means writing a kernel-mode driver.

In kernel-mode, there is the notion of IRQL, where code triggered to run at higher levels preempts code running at lower levels. User-mode code runs at IRQL 0, so all kernel-mode code at any higher level takes precedence. The thread scheduler itself runs at an elevated level, 2 I believe (which is called DISPATCH_LEVEL), so it can preempt any scheduled user-mode code of any priority, including, I believe, REALTIME_PRIORITY_CLASS. Hardware interrupts including timers run even higher.

A hardware timer will invoke its interrupt handler about as accurately as the timer resolution, if there's a CPU/core available at a lower IRQL (higher-level interrupt handlers not executing).

If there is much work to do, one shouldn't do it in the interrupt handler (IRQL > DISPATCH_LEVEL), but use the interrupt handler to schedule the larger body of work to run "soon" at DISPATCH_LEVEL using a Deferred Procedure Call (DPC), which still prevents the thread scheduler from interfering, but doesn't prevent other interrupt handlers from handling their hardware interrupts.

A likely problem with your option 3 is that firing an event to wake a thread to run user-mode code at IRQL 0 is that it again allows the thread scheduler to decide when the user-mode code will execute. You may need to do your time-sensitive work in kernel-mode at DISPATCH_LEVEL.

Another issue is that interrupts fire without regard to the process context the CPU core was running. So when the timer fires, the handler likely runs in the context of a process unrelated to yours. So you may need to do your time-sensitive work in a kernel-mode driver, using kernel-space memory, independent of your process, and then feed any results back to your app later, when it resumes running and can interact with the driver. (Apps can interact with drivers by passing buffers down via the DeviceIoControl API.)

I am not suggesting you implement a hardware timer interrupt handler; the OS already does that. Rather, use the kernel timer services to invoke your code based on the OS handling of the timer interrupt. See KeSetTimer and ExSetTimer. Both of these can call back to your code at DISPATCH_LEVEL after the timer fires.

And (even in kernel-mode) the system timer resolution may, by default, be too coarse for your 1 ms requirement.

https://msdn.microsoft.com/en-us/library/windows/hardware/dn265247(v=vs.85).aspx

For example, for Windows running on an x86 processor, the default interval between system clock ticks is typically about 15 milliseconds

For higher resolution, you may

  1. change the system clock resolution

Starting with Windows 2000, a driver can call the ExSetTimerResolution routine to change the time interval between successive system clock interrupts. For example, a driver can call this routine to change the system clock from its default rate to its maximum rate to improve timer accuracy. However, using ExSetTimerResolution has several disadvantages compared to using high-resolution timers created by ExAllocateTimer.

...

  1. use newer kernel-mode APIs for high-resolution timers that manage the clock resolution automatically.

Starting with Windows 8.1, drivers can use the ExXxxTimer routines to manage high-resolution timers. The accuracy of a high-resolution timer is limited only by the maximum supported resolution of the system clock. In contrast, timers that are limited to the default system clock resolution are significantly less accurate.

However, high-resolution timers require system clock interrupts to—at least, temporarily—occur at a higher rate, which tends to increase power consumption. Thus, drivers should use high-resolution timers only when timer accuracy is essential, and use default-resolution timers in all other cases.

查看更多
登录 后发表回答