How to efficiently make 1000s of web requests as q

2019-05-17 07:52发布

I need to make 100,000s of lightweight (i.e. small Content-Length) web requests from a C# console app. What is the fastest way I can do this (i.e. have completed all the requests in the shortest possible time) and what best practices should I follow? I can't fire and forget because I need to capture the responses.

Presumably I'd want to use the async web requests methods, however I'm wondering what the impact of the overhead of storing all the Task continuations and marshalling would be.

Memory consumption is not an overall concern, the objective is speed.

Presumably I'd also want to make use of all the cores available.

So I can do something like this:

Parallel.ForEach(iterations, i =>
{
    var response = await MakeRequest(i);
    // do thing with response
});

but that wont make me any faster than just my number of cores...

I can do:

Parallel.ForEach(iterations, i =>
{
    var response = MakeRequest(i);
    response.GetAwaiter().OnCompleted(() =>
    {
        // do thing with response
    });
});

but how do I keep my program running after the ForEach. Holding on to all the Tasks and WhenAlling them feels bloated, are there any existing patterns or helpers to have some kind of Task queue?

Is there any way to get any better, and how should I handle throttling/error detection? For instance, if the remote endpoint is slow to respond I don't want to continue spamming it

I understand I also need to do:

ServicePointManager.DefaultConnectionLimit = int.MaxValue

Anything else necessary?

3条回答
做个烂人
2楼-- · 2019-05-17 08:09

The Parallel class does not work with async loop bodies so you can't use it. Your loop body completes almost immediately and returns a task. There is no parallelism benefit here.

This is a very easy problem. Use one of the standard solutions for processing a series of items asynchronously with a given DOP (this one is good: http://blogs.msdn.com/b/pfxteam/archive/2012/03/05/10278165.aspx. Use the last piece of code).

You need to empirically determine the right DOP. Simply try different values. There is no theoretical way to derive the best value because it is dependent on many things.

The connection limit is the only limit that's in your way.

response.GetAwaiter().OnCompleted

Not sure what you tried to accomplish there... If you comment I'll explain the misunderstanding.

查看更多
孤傲高冷的网名
3楼-- · 2019-05-17 08:10

The operation you want to perform is

  1. Call an I/O method
  2. Process the result

You are correct that you should use an async version of the I/O method. What's more, you only need 1 thread to start all of the I/O operations. You will not benefit from parallelism here.

You will benefit from parallelism in the second part - processing the result, as this will be a CPU-bound operation. Luckily, async/await will do all the job for you. Console applications don't have a synchronization context. It means that the part of the method after an await will run on a thread pool thread, optimally utilizing all CPU cores.

private async Task MakeRequestAndProcessResult(int i)
{
    var result = await MakeRequestAsync();
    ProcessResult(result);
}

var tasks = iterations.Select(i => MakeRequestAndProcessResult(i)).ToArray();

To achieve the same behavior in an environment with a synchronization context (for example WPF or WinForms), use ConfigureAwait(false).

var result = await MakeRequestAsync().ConfigureAwait(false);

To wait for the tasks to complete, you can use await Task.WhenAll(tasks) inside an async method or Task.WaitAll(tasks) in Main().

Throwing 100k requests at a web service will probably kill it, so you will have to limit it. You can check answers to this question to find some options how to do it.

查看更多
SAY GOODBYE
4楼-- · 2019-05-17 08:12

Parallel.ForEach should be able to use more threads than there are cores if you explicitly set the MaxDegreeOfParallelism property of the ParallelOptions parameter (in the overload of ForEach where there is that parameter) - see https://msdn.microsoft.com/en-us/library/system.threading.tasks.paralleloptions.maxdegreeofparallelism(v=vs.110).aspx

You should be able to set this on 1,000 to get it to use 1,000 threads or even more, but that might not be efficient due to the threading overheads. You may wish to experiment (eg. loop from eg. 100 to 1,000 stepping in 100s to try submitting 1,000 requests each time and time start to finish) or even set up some kind of self-tuning algorithm.

查看更多
登录 后发表回答