Async/await, custom awaiter and garbage collector

2019-01-11 01:34发布

问题:

I'm dealing with a situation where a managed object gets prematurely finalized in the middle of async method.

This is a hobby home automation project (Windows 8.1, .NET 4.5.1), where I supply a C# callback to an unmanaged 3rd party DLL. The callback gets invoked upon a certain sensor event.

To handle the event, I use async/await and a simple custom awaiter (rather than TaskCompletionSource). I do it this way partly to reduce the number of unnecessary allocations, but mostly out of curiosity as a learning exercise.

Below is a very stripped version of what I have, using a Win32 timer-queue timer to simulate the unmanaged event source. Let's start with the output:

Press Enter to exit...
Awaiter()
tick: 0
tick: 1
~Awaiter()
tick: 2
tick: 3
tick: 4

Note how my awaiter gets finalized after the second tick. This is unexpected.

The code (a console app):

using System;
using System.Runtime.InteropServices;
using System.Threading;
using System.Threading.Tasks;

namespace ConsoleApplication
{
    class Program
    {
        static async Task TestAsync()
        {
            var awaiter = new Awaiter();
            //var hold = GCHandle.Alloc(awaiter);

            WaitOrTimerCallbackProc callback = (a, b) =>
                awaiter.Continue();

            IntPtr timerHandle;
            if (!CreateTimerQueueTimer(out timerHandle, 
                    IntPtr.Zero, 
                    callback, 
                    IntPtr.Zero, 500, 500, 0))
                throw new System.ComponentModel.Win32Exception(
                    Marshal.GetLastWin32Error());

            var i = 0;
            while (true)
            {
                await awaiter;
                Console.WriteLine("tick: " + i++);
            }
        }

        static void Main(string[] args)
        {
            Console.WriteLine("Press Enter to exit...");
            var task = TestAsync();
            Thread.Sleep(1000);
            GC.Collect(GC.MaxGeneration, GCCollectionMode.Forced);
            Console.ReadLine();
        }

        // custom awaiter
        public class Awaiter : 
            System.Runtime.CompilerServices.INotifyCompletion
        {
            Action _continuation;

            public Awaiter()
            {
                Console.WriteLine("Awaiter()");
            }

            ~Awaiter()
            {
                Console.WriteLine("~Awaiter()");
            }

            // resume after await, called upon external event
            public void Continue()
            {
                var continuation = Interlocked.Exchange(ref _continuation, null);
                if (continuation != null)
                    continuation();
            }

            // custom Awaiter methods
            public Awaiter GetAwaiter()
            {
                return this;
            }

            public bool IsCompleted
            {
                get { return false; }
            }

            public void GetResult()
            {
            }

            // INotifyCompletion
            public void OnCompleted(Action continuation)
            {
                Volatile.Write(ref _continuation, continuation);
            }
        }

        // p/invoke
        delegate void WaitOrTimerCallbackProc(IntPtr lpParameter, bool TimerOrWaitFired);

        [DllImport("kernel32.dll")]
        static extern bool CreateTimerQueueTimer(out IntPtr phNewTimer,
           IntPtr TimerQueue, WaitOrTimerCallbackProc Callback, IntPtr Parameter,
           uint DueTime, uint Period, uint Flags);
    }
}

I managed to suppress the collection of awaiter with this line:

var hold = GCHandle.Alloc(awaiter);

However I don't fully understand why I have to create a strong reference like this. The awaiter is referenced inside an endless loop. AFAICT, it is not going out of scope until the task returned by TestAsync becomes completed (cancelled/faulted). And the task itself is referenced inside Main forever.

Eventually, I reduced TestAsync to just this:

static async Task TestAsync()
{
    var awaiter = new Awaiter();
    //var hold = GCHandle.Alloc(awaiter);

    var i = 0;
    while (true)
    {
        await awaiter;
        Console.WriteLine("tick: " + i++);
    }
}

The collection still takes place. I suspect the whole compiler-generated state machine object is getting collected. Can someone please explain why this is happening?

Now, with the following minor modification, the awaiter no longer gets garbage-collected:

static async Task TestAsync()
{
    var awaiter = new Awaiter();
    //var hold = GCHandle.Alloc(awaiter);

    var i = 0;
    while (true)
    {
        //await awaiter;
        await Task.Delay(500);
        Console.WriteLine("tick: " + i++);
    }
}

Updated, this fiddle shows how the awaiter object gets garbage-collected without any p/invoke code. I think, the reason might be that there is no external references to awaiter outside the initial state of the generated state machine object. I need to study the compiler-generated code.


Updated, here's the compiler-generated code (for this fiddle, VS2012). Apparently, the Task returned by stateMachine.t__builder.Task doesn't keep a reference to (or rather, a copy of) the state machine itself (stateMachine). Am I missing something?

    private static Task TestAsync()
    {
      Program.TestAsyncd__0 stateMachine;
      stateMachine.t__builder = AsyncTaskMethodBuilder.Create();
      stateMachine.1__state = -1;
      stateMachine.t__builder.Start<Program.TestAsyncd__0>(ref stateMachine);
      return stateMachine.t__builder.Task;
    }

    [CompilerGenerated]
    [StructLayout(LayoutKind.Auto)]
    private struct TestAsyncd__0 : IAsyncStateMachine
    {
      public int 1__state;
      public AsyncTaskMethodBuilder t__builder;
      public Program.Awaiter awaiter5__1;
      public int i5__2;
      private object u__awaiter3;
      private object t__stack;

      void IAsyncStateMachine.MoveNext()
      {
        try
        {
          bool flag = true;
          Program.Awaiter awaiter;
          switch (this.1__state)
          {
            case -3:
              goto label_7;
            case 0:
              awaiter = (Program.Awaiter) this.u__awaiter3;
              this.u__awaiter3 = (object) null;
              this.1__state = -1;
              break;
            default:
              this.awaiter5__1 = new Program.Awaiter();
              this.i5__2 = 0;
              goto label_5;
          }
label_4:
          awaiter.GetResult();
          Console.WriteLine("tick: " + (object) this.i5__2++);
label_5:
          awaiter = this.awaiter5__1.GetAwaiter();
          if (!awaiter.IsCompleted)
          {
            this.1__state = 0;
            this.u__awaiter3 = (object) awaiter;
            this.t__builder.AwaitOnCompleted<Program.Awaiter, Program.TestAsyncd__0>(ref awaiter, ref this);
            flag = false;
            return;
          }
          else
            goto label_4;
        }
        catch (Exception ex)
        {
          this.1__state = -2;
          this.t__builder.SetException(ex);
          return;
        }
label_7:
        this.1__state = -2;
        this.t__builder.SetResult();
      }

      [DebuggerHidden]
      void IAsyncStateMachine.SetStateMachine(IAsyncStateMachine param0)
      {
        this.t__builder.SetStateMachine(param0);
      }
    }

回答1:

I've removed all p/invoke stuff and re-created a simplified version of the compiler-generated state machine logic. It exhibits the same behavior: the awaiter gets garabage-collected after the first invocation of the state machine's MoveNext method.

Microsoft has recently done an excellent job on providing the Web UI to their .NET reference sources, that's been very helpful. After studying the implementation of AsyncTaskMethodBuilder and, most importantly, AsyncMethodBuilderCore.GetCompletionAction, I now believe the GC behavior I'm seeing makes perfect sense. I'll try to explain that below.

The code:

using System;
using System.Threading;
using System.Threading.Tasks;
using System.Runtime.InteropServices;
using System.Runtime.CompilerServices;

namespace ConsoleApplication
{
    public class Program
    {
        // Original version with async/await

        /*
        static async Task TestAsync()
        {
            Console.WriteLine("Enter TestAsync");
            var awaiter = new Awaiter();
            //var hold = GCHandle.Alloc(awaiter);

            var i = 0;
            while (true)
            {
                await awaiter;
                Console.WriteLine("tick: " + i++);
            }
            Console.WriteLine("Exit TestAsync");
        }
        */

        // Manually coded state machine version

        struct StateMachine: IAsyncStateMachine
        {
            public int _state;
            public Awaiter _awaiter;
            public AsyncTaskMethodBuilder _builder;

            public void MoveNext()
            {
                Console.WriteLine("StateMachine.MoveNext, state: " + this._state);
                switch (this._state)
                {
                    case -1:
                        {
                            this._awaiter = new Awaiter();
                            goto case 0;
                        };
                    case 0:
                        {
                            this._state = 0;
                            var awaiter = this._awaiter;
                            this._builder.AwaitOnCompleted(ref awaiter, ref this);
                            return;
                        };

                    default:
                        throw new InvalidOperationException();
                }
            }

            public void SetStateMachine(IAsyncStateMachine stateMachine)
            {
                Console.WriteLine("StateMachine.SetStateMachine, state: " + this._state);
                this._builder.SetStateMachine(stateMachine);
                // s_strongRef = stateMachine;
            }

            static object s_strongRef = null;
        }

        static Task TestAsync()
        {
            StateMachine stateMachine = new StateMachine();
            stateMachine._state = -1;

            stateMachine._builder = AsyncTaskMethodBuilder.Create();
            stateMachine._builder.Start(ref stateMachine);

            return stateMachine._builder.Task;
        }

        public static void Main(string[] args)
        {
            var task = TestAsync();
            Thread.Sleep(1000);
            GC.Collect(GC.MaxGeneration, GCCollectionMode.Forced);
            Console.WriteLine("Press Enter to exit...");
            Console.ReadLine();
        }

        // custom awaiter
        public class Awaiter :
            System.Runtime.CompilerServices.INotifyCompletion
        {
            Action _continuation;

            public Awaiter()
            {
                Console.WriteLine("Awaiter()");
            }

            ~Awaiter()
            {
                Console.WriteLine("~Awaiter()");
            }

            // resume after await, called upon external event
            public void Continue()
            {
                var continuation = Interlocked.Exchange(ref _continuation, null);
                if (continuation != null)
                    continuation();
            }

            // custom Awaiter methods
            public Awaiter GetAwaiter()
            {
                return this;
            }

            public bool IsCompleted
            {
                get { return false; }
            }

            public void GetResult()
            {
            }

            // INotifyCompletion
            public void OnCompleted(Action continuation)
            {
                Console.WriteLine("Awaiter.OnCompleted");
                Volatile.Write(ref _continuation, continuation);
            }
        }
    }
}

The compiler-generated state machine is a mutable struct, being passed over by ref. Apparently, this is an optimization to avoid extra allocations.

The core part of this is taking place inside AsyncMethodBuilderCore.GetCompletionAction, where the current state machine struct gets boxed, and the reference to the boxed copy is kept by the continuation callback passed to INotifyCompletion.OnCompleted.

This is the only reference to the state machine which has a chance to stand the GC and survive after await. The Task object returned by TestAsync does not hold a reference to it, only the await continuation callback does. I believe this is done on purpose, to preserve the efficient GC behavior.

Note the commented line:

// s_strongRef = stateMachine;

If I un-comment it, the boxed copy of the state machine doesn't get GC'ed, and awaiter stays alive as a part of it. Of course, this is not a solution, but it illustrates the problem.

So, I've come to the following conclusion. While an async operation is in "in-flight" and none of the state machine's states (MoveNext) is currently being executed, it's the responsibility of the "keeper" of the continuation callback to put a strong hold on the callback itself, to make sure the boxed copy of the state machine does not get garbage-collected.

For example, in case with YieldAwaitable (returned by Task.Yield), the external reference to the continuation callback is kept by the ThreadPool task scheduler, as a result of ThreadPool.QueueUserWorkItem call. In case with Task.GetAwaiter, it is indirectly referenced by the task object.

In my case, the "keeper" of the continuation callback is the Awaiter itself.

Thus, as long as there is no external references to the continuation callback the CLR is aware of (outside the state machine object), the custom awaiter should take steps to keep the callback object alive. This, in turn, would keep alive the whole state machine. The following steps would be necessary in this case:

  1. Call the GCHandle.Alloc on the callback upon INotifyCompletion.OnCompleted.
  2. Call GCHandle.Free when the async event has actually happened, before invoking the continuation callback.
  3. Implement IDispose to call GCHandle.Free if the event has never happened.

Given that, below is a version of the original timer callback code, which works correctly. Note, there is no need to put a strong hold on the timer callback delegate (WaitOrTimerCallbackProc callback). It is kept alive as a part of the state machine. Updated: as pointed out by @svick, this statement may be specific to the current implementation of the state machine (C# 5.0). I've added GC.KeepAlive(callback) to eliminate any dependency on this behavior, in case it changes in the future compiler versions.

using System;
using System.Runtime.InteropServices;
using System.Threading;
using System.Threading.Tasks;

namespace ConsoleApplication
{
    class Program
    {
        // Test task
        static async Task TestAsync(CancellationToken token)
        {
            using (var awaiter = new Awaiter())
            {
                WaitOrTimerCallbackProc callback = (a, b) =>
                    awaiter.Continue();
                try
                {
                    IntPtr timerHandle;
                    if (!CreateTimerQueueTimer(out timerHandle,
                            IntPtr.Zero,
                            callback,
                            IntPtr.Zero, 500, 500, 0))
                        throw new System.ComponentModel.Win32Exception(
                            Marshal.GetLastWin32Error());
                    try
                    {
                        var i = 0;
                        while (true)
                        {
                            token.ThrowIfCancellationRequested();
                            await awaiter;
                            Console.WriteLine("tick: " + i++);
                        }
                    }
                    finally
                    {
                        DeleteTimerQueueTimer(IntPtr.Zero, timerHandle, IntPtr.Zero);
                    }
                }
                finally
                {
                    // reference the callback at the end
                    // to avoid a chance for it to be GC'ed
                    GC.KeepAlive(callback);
                }
            }
        }

        // Entry point
        static void Main(string[] args)
        {
            // cancel in 3s
            var testTask = TestAsync(new CancellationTokenSource(10 * 1000).Token);

            Thread.Sleep(1000);
            GC.Collect(GC.MaxGeneration, GCCollectionMode.Forced, true);

            Thread.Sleep(2000);
            Console.WriteLine("Press Enter to GC...");
            Console.ReadLine();

            GC.Collect(GC.MaxGeneration, GCCollectionMode.Forced);
            Console.WriteLine("Press Enter to exit...");
            Console.ReadLine();
        }

        // Custom awaiter
        public class Awaiter :
            System.Runtime.CompilerServices.INotifyCompletion,
            IDisposable
        {
            Action _continuation;
            GCHandle _hold = new GCHandle();

            public Awaiter()
            {
                Console.WriteLine("Awaiter()");
            }

            ~Awaiter()
            {
                Console.WriteLine("~Awaiter()");
            }

            void ReleaseHold()
            {
                if (_hold.IsAllocated)
                    _hold.Free();
            }

            // resume after await, called upon external event
            public void Continue()
            {
                Action continuation;

                // it's OK to use lock (this)
                // the C# compiler would never do this,
                // because it's slated to work with struct awaiters
                lock (this)
                {
                    continuation = _continuation;
                    _continuation = null;
                    ReleaseHold();
                }

                if (continuation != null)
                    continuation();
            }

            // custom Awaiter methods
            public Awaiter GetAwaiter()
            {
                return this;
            }

            public bool IsCompleted
            {
                get { return false; }
            }

            public void GetResult()
            {
            }

            // INotifyCompletion
            public void OnCompleted(Action continuation)
            {
                lock (this)
                {
                    ReleaseHold();
                    _continuation = continuation;
                    _hold = GCHandle.Alloc(_continuation);
                }
            }

            // IDispose
            public void Dispose()
            {
                lock (this)
                {
                    _continuation = null;
                    ReleaseHold();
                }
            }
        }

        // p/invoke
        delegate void WaitOrTimerCallbackProc(IntPtr lpParameter, bool TimerOrWaitFired);

        [DllImport("kernel32.dll")]
        static extern bool CreateTimerQueueTimer(out IntPtr phNewTimer,
            IntPtr TimerQueue, WaitOrTimerCallbackProc Callback, IntPtr Parameter,
            uint DueTime, uint Period, uint Flags);

        [DllImport("kernel32.dll")]
        static extern bool DeleteTimerQueueTimer(IntPtr TimerQueue, IntPtr Timer,
            IntPtr CompletionEvent);
    }
}