Tail recursion with tasks?

2019-07-19 02:12发布

问题:

I was taking a look at the this article regarding tail recursion in C#

I get the general example that they used (Factorial) and I was trying to apply it to a situation in which I use Task recursion. I am having a problem trying to apply the example above with a Task that has no return type.

Imagine I have the following class and methods in C#:

public class Crawler(CancellationToken token)
{
    this.cancelToken = token;
}

public async Task CrawlData(string dataLocation, bool repeat = false)
{
    /*Most of processing done here, omitted for space...*/
    while(canCrawl == false)
    {
        await Task.Delay(250);
    }
    await CrawlData(dataLocation, repeat);
}

Currently this works great! It can run for a very long time (days, weeks, months) without issue because of a delay which is when canCrawl is equal to false.

If anyone is confused, the reason I am using recursion with Tasks is because I call this method from another class which relies on Task.Run() to initially execute this method, and it relies on that Task being alive as long as the recursion is still going so that execution in that main class does not jump to the end after the first call. This all has to do with being able to cancel the task while in progress.

My concern is that if this were to run for long enough, it would eventually lead to a StackOverflowException just as the Factorial example did in the link at the beginning of the post.

I've tried applying that example to a Task with no return type which is what I am using, but I am not having any luck.

Do any of you have experience with tail recursion in C# with Task?

回答1:

If you're asking whether you'll get a StackOverflowException the answer is a definite no.

Because the method is an async one, it doesn't preserve the stack when resuming after an await. So after a delay of 250ms the thread that keeps running will have a new and clean stack.

However, you will eventually get an OutOfMemoryException because each async method is transformed into a state machine struct which is boxed and kept alive until that method completes (which in your case is only after all its tail calls complete).

Task.Delay is too slow to simulate this behavior but you can do that with Task.Yield which completes asynchronously (like Task.Delay) but immediately (unlike Task.Delay which has a delay of at least 15ms). This example runs out of memory on my machine in less than a minute.

private static void Main()
{
    FooAsync().Wait();
}

private static async Task FooAsync()
{
    await Task.Yield();
    await FooAsync();
}

If you manage to have a task-returning method that isn't async but does what you need using continuations than it may avoid that exception, but that depends on the implementation.


"If anyone is confused, the reason I am using recursion with Tasks is because I call this method from another class which relies on Task.Run() to initially execute this method"

There's probably no reason to use Task.Run in this case. You can simply call the method and get a task as return-value.



回答2:

My concern is that if this were to run for long enough, it would eventually lead to a StackOverflowException just as the Factorial example did in the link at the beginning of the post.

It won't, because the state machine that the code gets compiled to isn't actually recursive in how it uses the stack. It could though run out of memory, and it would certainly be suboptimal.

Still, the same approach of doing tail-call elimination by hand can be applied here as you would if you had a tail-call method you wanted to optimise because the jitter wasn't doing tail-call elimination for you. (Or simply because even when it does hand-eliminated tail calls are often more performant with .NET anyway).

Consider if you had the equivalent synchronous method:

public void CrawlData(string dataLocation, bool repeat = false)
{
  /*Most of processing done here, omitted for space...*/
  while(canCrawl == false)
  {
    Thread.Sleep(250); // yes, not ideal but a reasonable analogy.
  }
  CrawlData(dataLocation, repeat);
}

You would turn this into iterative code by setting dataLocation and repeat to their new value (which is a no-op in your example, because you are calling them with the old values, and going back to the start:

public void CrawlData(string dataLocation, bool repeat = false)
{
  start:
  /*Most of processing done here, omitted for space...*/
  while(canCrawl == false)
  {
    Thread.Sleep(250);
  }
  goto start;
}

And then apply some velociraptor protection:

public void CrawlData(string dataLocation, bool repeat = false)
{
  for(;;)
  {
      /*Most of processing done here, omitted for space...*/
      while(canCrawl == false)
      {
        Thread.Sleep(250);
      }
  }
}

And there you have an iterative version. By analogy the iterative async version is:

public async Task CrawlDataAsync(string dataLocation, bool repeat = false)
{
  for(;;)
  {
      /*Most of processing done here, omitted for space...*/
      while(canCrawl == false)
      {
        await Task.Delay(250);
      }
  }
}

Let' also have a version that returns for comparison. Imagine you're non-async version was:

public void CrawlData(string dataLocation, bool repeat = false)
{
  /*Most of processing done here, omitted for space...*/
  while(canCrawl == false)
  {
    Thread.Sleep(250);
  }
  if(repeat)
    CrawlData(dataLocation, repeat);
}

Now, assuming something can change repeat in the "most of processing done here" space, then this can exit eventually. And the iterative equivalent is:

public void CrawlData(string dataLocation, bool repeat = false)
{
  do
  {
    /*Most of processing done here, omitted for space...*/
    while(canCrawl == false)
    {
      Thread.Sleep(250);
    }
  } while(repeat);
}

Of which the async equivalent is:

public async Task CrawlDataAsync(string dataLocation, bool repeat = false)
{
  do
  {
    /*Most of processing done here, omitted for space...*/
    while(canCrawl == false)
    {
      await Task.Delay(250);
    }
  } while(repeat);
}