Run multiple instances of same method asynchronous

2019-02-15 09:52发布

问题:

My requirement is quite weird.

I have SomeMethod() which calls GetDataFor().

public void SomeMethod()
{
    for(int i = 0; i<100; i++) {
        var data = GetDataFor(i);
    }
}

public data GetDataFor(int i) {
    //call a remote API
    //to generate data for i
    //store to database
    return data;
}

For each i, the end result will always be different. There is no need to wait for GetDataFor(i) to complete before calling GetDataFor(i+1).

In other words I need to:

  • call GetDataFor() for each i+1 immediately after successfully calling i (Calling them in parallel looks impossible)
  • wait until all the 100 instances of GetDataFor() are completed running
  • leave the scope of SomeMethod()

Following YK1's answer, I have tried to modify it like this:

public async Task<void> SomeMethod()
{
    for(int i = 0; i < 100; i++) {
        var task = Task.Run(() => GetDataFor(i));
        var data = await task;
    }
}

It didn't thrown any errors but I need to understand the concept behind this:

  • How task will distinguish between different calls for awaiting? It is getting over-written.
  • Is it blatantly wrong way to do this? So, how do do it right?

回答1:

You can use Parallel.For:

public void SomeMethod()
{
    Parallel.For(0, 100, i =>
    {
        var data = GetDataFor(i);
        //Do something
    });
}

public data GetDataFor(int i)
{
    //generate data for i
    return data;
}

EDIT:

The syntax of a parallel loop is very similar to the for and foreach loops you already know, but the parallel loop runs faster on a computer that has available cores. Another difference is that, unlike a sequential loop, the order of execution isn't defined for a parallel loop. Steps often take place at the same time, in parallel. Sometimes, two steps take place in the opposite order than they would if the loop were sequential. The only guarantee is that all of the loop's iterations will have run by the time the loop finishes.

For parallel loops, the degree of parallelism doesn't need to be specified by your code. Instead, the run-time environment executes the steps of the loop at the same time on as many cores as it can. The loop works correctly no matter how many cores are available. If there is only one core, the performance is close to (perhaps within a few percentage points of) the sequential equivalent. If there are multiple cores, performance improves; in many cases, performance improves proportionately with the number of cores.

You can see a more detailed explanation here.



回答2:

I would instead add each of the tasks to a collection and then await on the entire collection AFTER the loop.

Awaiting inside of a loop like that will create lots of continuations and more overhead than desirable including waiting for each call to finish before continuing the loop I believe.

Take a look at awaiting Task.WaitAll instead.

If instead the value of each task is important to process then look at awaiting Task.WhenAll and then read the results of each Task into your return collection.



回答3:

There's a couple of different approaches.

First, you could keep it synchronous and just execute them in parallel (on different threads). Parallel LINQ is better than Parallel if you want to collect all the results in the calling method before continuing:

public data[] SomeMethod()
{
  return Enumerable.Range(0, 100)
      .AsParallel().AsOrdered()
      .Select(GetDataFor).ToArray();
}

Second, you could make it asynchronous. To make something truly asynchronous, you need to start at the lowest level (in this case, "call a remote API" and "store to database") and make that asynchronous first. Then you can make GetDataFor asynchronous:

public async Task<data> GetDataForAsync(int i)
{
  await .. //call a remote API asynchronously
  await .. //store to database asynchronously
  return data;
}

Then you can make SomeMethod asynchronous as well:

public Task<data[]> SomeMethodAsync()
{
  return Task.WhenAll(
      Enumerable.Range(0, 100).Select(GetDataForAsync)
  );
}

Making the code asynchronous is more work - more of the code has to change - but it's better in terms of scalability and resource use.



回答4:

When using async awaityou're essentially saying "whilst waiting for this task to finish please go off and do some independent work that doesn't rely on this task". As you don't care about waiting for GetDataFor to finish you don't really want to use async await.

This previous question seems to have a very similar request as yours. With that in mind I think you should be able to do something like this:

public void SomeMethod()
{
    Task.Run(() => GetDataFor(i));
}

Basically, this assumes you don't need to wait for the GetDataFor to finish before doing anything else, it's literally 'fire and forget'.

With regards to Parallel.For, you are likely to see some improvement in performance so long as you have more than 1 core. If not, you will probably see an ever so slight decrease in performance (more overhead). Here's an article that helps explain how it works.

UPDATE

Following your comment then I would suggest something like:

var TList = new List<Task>();

for (var i = 0; i < 100; i++)
{
    TList.Add(Task.Run(() => GetDataFor(i)));
}

await Task.WhenAll(TList);     

Here's a useful question that highlights why you might want to use WhenAll instead of WaitAll.

You might want to include some checking around the status of completion of the tasks to see which failed (if any). See here for an example.



回答5:

The code actually makes no sense.

How task will distinguish between different calls for awaiting? It is getting over-written.

It does not get overwritten. Because...

for(int i = 0; i < 100; i++) {
    var task = Task.Run(() => GetDataFor(i));
    var data = await task;
}

This is WAITING for every request to finish before continuing the loop. Await waits for the end.

Which means the whole task thing is irrelevant - nothing happens in parallel here. You can cut some minor overhead by doing it without a task.

I suspect the OP wanted to achieve something that he simply did not and he was not spending enough time debugging to realize he has single threaded the whole loop again.



回答6:

Although your original code is overwriting the values, it seems like you are trying to combine the results of parallel operations. If so, consider using Task.ContinueWith to process the return values. Your code would look something like this:

public void SomeMethod()
    List<Task> tasks = new List<Task>();
    for (var i = 0; i < 100; i++)
    {
        tasks.Add(Task.Run(() => GetDataFor(i)).ContinueWith((antecedent) => {
            // Process the results here.
        }));
    }
    Task.WaitAll(tasks.ToArray());
}