My requirement is quite weird.
I have SomeMethod()
which calls GetDataFor()
.
public void SomeMethod()
{
for(int i = 0; i<100; i++) {
var data = GetDataFor(i);
}
}
public data GetDataFor(int i) {
//call a remote API
//to generate data for i
//store to database
return data;
}
For each i
, the end result will always be different. There is no need to wait for GetDataFor(i)
to complete before calling GetDataFor(i+1)
.
In other words I need to:
- call
GetDataFor()
for eachi+1
immediately after successfully callingi
(Calling them in parallel looks impossible) - wait until all the 100 instances of
GetDataFor()
are completed running - leave the scope of
SomeMethod()
Following YK1's answer, I have tried to modify it like this:
public async Task<void> SomeMethod()
{
for(int i = 0; i < 100; i++) {
var task = Task.Run(() => GetDataFor(i));
var data = await task;
}
}
It didn't thrown any errors but I need to understand the concept behind this:
- How
task
will distinguish between different calls forawait
ing? It is getting over-written. - Is it blatantly wrong way to do this? So, how do do it right?
You can use
Parallel.For
:EDIT:
The syntax of a parallel loop is very similar to the
for
andforeach
loops you already know, but the parallel loop runs faster on a computer that has available cores. Another difference is that, unlike a sequential loop, the order of execution isn't defined for a parallel loop. Steps often take place at the same time, in parallel. Sometimes, two steps take place in the opposite order than they would if the loop were sequential. The only guarantee is that all of the loop's iterations will have run by the time the loop finishes.For parallel loops, the degree of parallelism doesn't need to be specified by your code. Instead, the run-time environment executes the steps of the loop at the same time on as many cores as it can. The loop works correctly no matter how many cores are available. If there is only one core, the performance is close to (perhaps within a few percentage points of) the sequential equivalent. If there are multiple cores, performance improves; in many cases, performance improves proportionately with the number of cores.
You can see a more detailed explanation here.
I would instead add each of the tasks to a collection and then await on the entire collection AFTER the loop.
Awaiting inside of a loop like that will create lots of continuations and more overhead than desirable including waiting for each call to finish before continuing the loop I believe.
Take a look at awaiting Task.WaitAll instead.
If instead the value of each task is important to process then look at awaiting Task.WhenAll and then read the results of each Task into your return collection.
When using
async
await
you're essentially saying "whilst waiting for this task to finish please go off and do some independent work that doesn't rely on this task". As you don't care about waiting for GetDataFor to finish you don't really want to useasync
await
.This previous question seems to have a very similar request as yours. With that in mind I think you should be able to do something like this:
Basically, this assumes you don't need to wait for the GetDataFor to finish before doing anything else, it's literally 'fire and forget'.
With regards to Parallel.For, you are likely to see some improvement in performance so long as you have more than 1 core. If not, you will probably see an ever so slight decrease in performance (more overhead). Here's an article that helps explain how it works.
UPDATE
Following your comment then I would suggest something like:
Here's a useful question that highlights why you might want to use WhenAll instead of WaitAll.
You might want to include some checking around the status of completion of the tasks to see which failed (if any). See here for an example.
The code actually makes no sense.
It does not get overwritten. Because...
This is WAITING for every request to finish before continuing the loop. Await waits for the end.
Which means the whole task thing is irrelevant - nothing happens in parallel here. You can cut some minor overhead by doing it without a task.
I suspect the OP wanted to achieve something that he simply did not and he was not spending enough time debugging to realize he has single threaded the whole loop again.
There's a couple of different approaches.
First, you could keep it synchronous and just execute them in parallel (on different threads). Parallel LINQ is better than
Parallel
if you want to collect all the results in the calling method before continuing:Second, you could make it asynchronous. To make something truly asynchronous, you need to start at the lowest level (in this case, "call a remote API" and "store to database") and make that asynchronous first. Then you can make
GetDataFor
asynchronous:Then you can make
SomeMethod
asynchronous as well:Making the code asynchronous is more work - more of the code has to change - but it's better in terms of scalability and resource use.
Although your original code is overwriting the values, it seems like you are trying to combine the results of parallel operations. If so, consider using Task.ContinueWith to process the return values. Your code would look something like this: