I'd love to figure it out myself but I was wondering roughly what's the algorithm for converting a function with yield statements into a state machine for an enumerator? For example how does C# turn this:
IEnumerator<string> strings(IEnumerable<string> args)
{ IEnumerator<string> enumerator2 = getAnotherEnumerator();
foreach(var arg in arg)
{ enumerator2.MoveNext();
yield return arg+enumerator.Current;
}
}
into this:
bool MoveNext()
{ switch (this.state)
{
case 0:
this.state = -1;
this.enumerator2 = getAnotherEnumerator();
this.argsEnumerator = this.args.GetEnumerator();
this.state = 1;
while (this.argsEnumerator.MoveNext())
{
this.arg = this.argsEnumerator.Current;
this.enumerator2.MoveNext();
this.current = this.arg + this.enumerator2.Current;
this.state = 2;
return true;
state1:
this.state = 1;
}
this.state = -1;
if (this.argsEnumerator != null) this.argsEnumerator.Dispose();
break;
case 2:
goto state1;
}
return false;
}
Of course the result can be completely different depending on the original code.
Just spotted this question - I wrote an article on it recently. I'll have to add the other links mentioned here to the article though...
The particular code sample you are looking at involves a series of transformations. Please note that this is an approximate description of the algorithm. The actual names used by the compiler, and the exact code it generates may be different. Then idea is the same, however.
The first transformation is the "foreach" transformation, which transforms this code:
into this code:
The second transformation finds all the yield return statements in the function body, assigns a number to each (a state value), and creates a "goto label" right after the yield.
The third transformation lifts all the local variables and function arguments in the method body into an object called a closure.
Given the code in your example, that would look similar to this:
The method body is then moved from the original method, to a method inside "Closure" called MoveNext, which returns a bool, and implements IEnumerable.MoveNext. Any access to any locals is routed through "this", and any access to any class members is routed through this.originalThis.
Any "yield return expr" is translated into:
Any yield break statement is translated into:
There is an "implicit" yield break statement at the end of the function. A switch statement is then introduced at the beginning of the procedure that looks at the state number and jumps to the associated label.
The original method is then translated into something like this:
The fact that the state of the method is all pushed into an object and that the MoveNext method uses a switch statement / state variable is what allows the iterator to behave as if control is being passed back to the point immediately after the last "yield return" statement the next time "MoveNext" is called.
It is important to point out, however, that the transformation used by the C# compiler is not the best way to do this. It suffers from poor performance when trying to use "yield" with recursive algorithms. There is a good paper that outlines a better way to do this here:
http://research.microsoft.com/en-us/projects/specsharp/iterators.pdf
It's worth a read if you haven't read it yet.
Raymond chen answers this; http://blogs.msdn.com/b/oldnewthing/archive/2008/08/12/8849519.aspx
(edited to point to part 1 of the series, not part 4)