-->

Infinite state machine with an IDisposable

2019-06-28 08:24发布

问题:

Lets say I have an infinite state machine to generate random md5 hashes:

public static IEnumerable<string> GetHashes()
{
    using (var hash = System.Security.Cryptography.MD5.Create())
    {
        while (true)
            yield return hash.ComputeHash(Guid.NewGuid().ToByteArray());
    }
}

In the above example I use an using statement. Will the .Dispose() method ever be called? CQ, will the unmanaged resources ever be freed?

For example, if I use the machine as follows:

public static void Test()
{
    int counter = 0;
    var hashes = GetHashes();
    foreach(var md5 in hashes)
    {
        Console.WriteLine(md5);
        counter++;
        if (counter > 10)
            break;
    }
}

Since the hashes variable will go out of scope (and I presume garbage collected) will the dispose method be called to free the resources used by System.Security.Cryptography.MD5 or is this a memory leak?

回答1:

Let's change your original code blocks a bit to boil it down to the essentials, while still keeping it interesting enough to analyze. This is not exactly equivalent to what you posted, but we're still using the value of the iterator.

class Disposable : IDisposable {
    public void Dispose() {
        Console.WriteLine("Disposed!");
    }
}

IEnumerable<int> CreateEnumerable() {
    int i = 0;
    using (var d = new Disposable()) {
       while (true) yield return ++i;
    }
}

void UseEnumerable() {
    foreach (int i in CreateEnumerable()) {
        Console.WriteLine(i);
        if (i == 10) break;
    }
}

This will print the numbers from 1 to 10 before printing Disposed!

What actually happens under the covers? A whole lot more. Let's tackle the outer layer first, UseEnumerable. The foreach is syntactic sugar for the following:

var e = CreateEnumerable().GetEnumerator();
try {
    while (e.MoveNext()) {
        int i = e.Current;
        Console.WriteLine(i);
        if (i == 10) break;
    }
} finally {
    e.Dispose();
}

For the exact details (because even this is simplified, a little) I refer you to the C# language specification, section 8.8.4. The important bit here is that a foreach entails an implicit call to the Dispose of the enumerator.

Next, the using statement in CreateEnumerable is syntactic sugar as well. In fact, let's write out the whole thing in primitive statements so we can make more sense of the translation later:

IEnumerable<int> CreateEnumerable() {
    int i = 0;
    Disposable d = new Disposable();
    try {
       repeat: 
       i = i + 1;
       yield return i;
       goto repeat;
    } finally {
       d.Dispose();
    }
}

The exact rules for implementation of iterator blocks are detailed in section 10.14 of the language specification. They're given in terms of abstract operations, not code. A good discussion on what kind of code is generated by the C# compiler and what each part does is given in C# in Depth, but I'm going to give a simple translation instead that still complies with the specification. To reiterate, this is not what the compiler will actually produce, but it's a good enough approximation to illustrate what's happening and leaves out the more hairy bits that deal with threading and optimization.

class CreateEnumerable_Enumerator : IEnumerator<int> {
    // local variables are promoted to instance fields
    private int i;
    private Disposable d;

    // implementation of Current
    private int current;
    public int Current => current;
    object IEnumerator.Current => current;

    // State machine
    enum State { Before, Running, Suspended, After };
    private State state = State.Before;

    // Section 10.14.4.1
    public bool MoveNext() {
        switch (state) {
            case State.Before: {
                    state = State.Running;
                    // begin iterator block
                    i = 0;
                    d = new Disposable();
                    i = i + 1;
                    // yield return occurs here
                    current = i;
                    state = State.Suspended;
                    return true;
                }
            case State.Running: return false; // can't happen
            case State.Suspended: {
                    state = State.Running;
                    // goto repeat
                    i = i + 1;
                    // yield return occurs here
                    current = i;
                    state = State.Suspended;
                    return true;
                }
            case State.After: return false; 
            default: return false;  // can't happen
        }
    }

    // Section 10.14.4.3
    public void Dispose() {
        switch (state) {
            case State.Before: state = State.After; break;
            case State.Running: break; // unspecified
            case State.Suspended: {
                    state = State.Running;
                    // finally occurs here
                    d.Dispose();
                    state = State.After;
                }
                break;
            case State.After: return;
            default: return;    // can't happen
        }
    }

    public void Reset() { throw new NotImplementedException(); }
}

class CreateEnumerable_Enumerable : IEnumerable<int> {
  public IEnumerator<int> GetEnumerator() {
    return new CreateEnumerable_Enumerator();
  }

  IEnumerator IEnumerable.GetEnumerator() {
    return GetEnumerator();
  }
}

IEnumerable<int> CreateEnumerable() {
  return new CreateEnumerable_Enumerable();
}

The essential bit here is that the code block is split up at the occurrences of a yield return or yield break statement, with the iterator responsible for remembering "where we were" at the time of the interruption. Any finally blocks in the body are deferred until the Dispose. The infinite loop in your code is really not an infinite loop anymore, because it's interrupted by periodic yield return statements. Note that, because the finally block isn't actually a finally block anymore, it getting executed is a little less certain when you're dealing with iterators. This is why using foreach (or any other way that ensures the Dispose method of the iterator is called in a finally block) is essential.

This is a simplified example; things get much more interesting when you make the loop more complex, introduce exceptions, etcetera. The burden of "just making this work" is on the compiler.



回答2:

Largely, it depends on how you code it. But in your example, Dispose will be called.

Here's an explanation on how iterators get compiled.

And specifically, talking about finally:

Iterators pose an awkward problem. Instead of the whole method executing before the stack frame is popped, execution effectively pauses each time a value is yielded. There's no way of guaranteeing that the caller will ever use the iterator again, in any way, shape or form. If you require some more code to be executed at some point after the value is yielded, you're in trouble: you can't guarantee it will happen. To cut to the chase, code in a finally block which would normally be executed in almost all circumstances before leaving the method can't be relied on quite as much.

...

The state machine is built so that finally blocks are executed when an iterator is used properly, however. That's because IEnumerator implements IDisposable, and the C# foreach loop calls Dispose on iterators (even the nongeneric IEnumerator ones, if they implement IDisposable). The IDisposable implementation in the generated iterator works out which finally blocks are relevant to the current position (based on the state, as always) and execute the appropriate code.