I have about 5 million items to update. I don't really care about the response (A response would be nice to have so I can log it, but I don't want a response if that will cost me time.) Having said that, is this code optimized to run as fast as possible? If there are 5 million items, would I run the risk of getting any task cancelled or timeout errors? I get about 1 or 2 responses back every second.
var tasks = items.Select(async item =>
{
await Update(CreateUrl(item));
}).ToList();
if (tasks.Any())
{
await Task.WhenAll(tasks);
}
private async Task<HttpResponseMessage> Update(string url)
{
var client = new HttpClient();
var response = await client.SendAsync(url).ConfigureAwait(false);
//log response.
}
UPDATE: I am actually getting TaskCanceledExceptions. Did my system run out of threads? What could I do to avoid this?
A much better approach would be to use
TPL Dataflow
'sActionBlock
withMaxDegreeOfParallelism
and a singleHttpClient
:HttpClient
can be used concurrently for multiple requests, and so it's much better to only create and disposing a single instance instead of 5 million.ActionBlock
caps that number with theMaxDegreeOfParallelism
(which you should test and optimize for your specific case). It's important to note that TPL may choose a lower number when it deems it to be appropriate.async
call at the end of anasync
method or lambda expression, it's better for performance to remove the redundantasync-await
and just return the task (i.ereturn block.Completion;
)Complete
will notify theActionBlock
to not accept any more items, but finish processing items it already has. When it's done theCompletion
task will be done so you canawait
it.I suspect you are suffering from outgoing connection management preventing large numbers of simultaneous connections to the same domain. The answers given in this extensive Q+A might give you some avenues to investigate.
What is limiting the # of simultaneous connections my ASP.NET application can make to a web service?
In terms of your code structure, I'd personally try and use a dynamic pool of connections. You know that you cant actually get 5m connections simultaneously so trying to attempt it will just fail to work - you may as well deal with a reasonable and configured limit of (for instance) 20 connections and use them in a pool. In this way you can tune up or down.
alternatively you could investigate HTTP Pipelining (which I've not used) which is intended specifically for the job you are doing (batching up Http requests). http://en.wikipedia.org/wiki/HTTP_pipelining
You method will kick off all tasks at the same time, which may not be what you want. There wouldn't be any threads involved because with
async
operations There is no thread, but there may be number of concurrent connection limits.There may be better tools to do this but if you want to use async/await one option is to use Stephen Toub's
ForEachAsync
as documented in this article. It allows you to control how many simultaneous operations you want to execute, so you don't overrun your connection limit.Here it is from the article:
Usage: