await thousands of Tasks

2020-07-27 02:48发布

I have an application which converts some data often there are 1.000 - 30.000 files.

I need to do 3 steps:

  1. copy a File (replace some text in there)
  2. Make a Webrequest with WebClient to download a file (I send the copied file to a WebServer, which converts the file to another format)
  3. Take the downloaded file and change some of the content

So all three steps include some I/O and I used async/await methods:

var tasks = files.Select(async (file) =>
{
    Item item = await createtempFile(file).ConfigureAwait(false);
    await convert(item).ConfigureAwait(false);
    await clean(item).ConfigureAwait(false);
}).ToList();

await Task.WhenAll(tasks).ConfigureAwait(false);

I don´t know if this is the best practice, because I create more than thousand tasks. I thought about splitting the three steps like:

List<Item> items = new List<Item>();
var tasks = files.Select(async (file) =>
{
    Item item = await createtempFile(file, ext).ConfigureAwait(false);
    lock(items)
        items.Add(item);
}).ToList();

await Task.WhenAll(tasks).ConfigureAwait(false);

var tasks = items.Select(async (item) =>
{
    await convert(item, baseAddress, ext).ConfigureAwait(false);
}).ToList();

await Task.WhenAll(tasks).ConfigureAwait(false);

var tasks = items.Select(async (item) =>
{
    await clean(targetFile, item.Doctype, ext).ConfigureAwait(false);
}).ToList();

await Task.WhenAll(tasks).ConfigureAwait(false);

But that doesn´t seem to be better or faster, because I create 3 times thousands of tasks.

Should I throttle the creation of tasks? Like chunks of 100 tasks? Or am I just overthinking it and the creation of thousands of tasks is just fine.

The CPU is idling with 2-4% peak, so I thought about too many awaits or context switches.

Maybe the WebRequest calls are too many, because the WebServer/WebService can´t handle thousands of Requests simultaneously and I should only throttle the WebRequests?

I already increased the .NET maxconnection in the app.config file.

2条回答
贪生不怕死
2楼-- · 2020-07-27 03:04

As commenters have correctly noted, you're overthinking it. The .NET runtime has absolutely no problem tracking thousands of tasks.

However, you might want to consider using a TPL Dataflow pipeline, which would enable you to easily have different concurrency levels for different operations ("blocks") in your pipeline.

查看更多
SAY GOODBYE
3楼-- · 2020-07-27 03:23

It is possible to execute async operations in parallel with limiting the number of concurrent operations. There is a cool extension method for that, it is not part of the .Net framework.

/// <summary>
/// Enumerates a collection in parallel and calls an async method on each item. Useful for making 
/// parallel async calls, e.g. independent web requests when the degree of parallelism needs to be
/// limited.
/// </summary>
public static Task ForEachAsync<T>(this IEnumerable<T> source, int degreeOfParalellism, Func<T, Task> action)
{
    return Task.WhenAll(Partitioner.Create(source).GetPartitions(degreeOfParalellism).Select(partition => Task.Run(async () =>
    {
        using (partition)
            while (partition.MoveNext())
                await action(partition.Current);
    })));
}

Call it like this:

var files = new List<string> {"one", "two", "three"};
await files.ForEachAsync(5, async file =>
{
   // do async stuff here with the file
   await Task.Delay(1000);
});
查看更多
登录 后发表回答