Is it possible to fire 3000 parallel asynchronous

2019-09-22 00:22发布

问题:

I have a web API controller that receives a filter object from the front-end.

The filter basically is just a json array of objects that contains the filtering criteria.

Now based on that filter I have to run a query against an azure API (Azure log analytics api).

This is illustrated in this code snippet :

 var tasks = filters.Select(filter=>
            {
                try
                {
                    return service.GetPerformanceCounters(filter.SComputerId, filter.WorkSpace);
                }
                catch (Exception ex)
                {
                    return Task.Run(() => new List<Metrics>());
                }
            })
            .ToList();

            var metrics = (await Task.WhenAll(tasks))
                                    .SelectMany(metric => metric)
                                    .ToList();

This basically means that if I have 3000 filter object I'll have to run 3000 asynchronous Parallel requests.

But when I was testing this code as I get to 100 filter object the asynchronous parallel tasks fail.

I get the following Exception in the debug panel :

Exception thrown: 'System.Net.Http.HttpRequestException' in Azure.DAL.dll Exception thrown: 'System.Net.Http.HttpRequestException' in System.Private.CoreLib.dll Exception thrown: 'System.Net.Http.HttpRequestException' in System.Private.CoreLib.dll Exception thrown: 'System.Net.Http.HttpRequestException' in Azure.DAL.dll Exception thrown: 'System.Net.Http.HttpRequestException' in System.Private.CoreLib.dll Exception thrown: 'System.Net.Http.HttpRequestException' in System.Private.CoreLib.dll Exception thrown: 'System.Net.Http.HttpRequestException' in System.Private.CoreLib.dll Exception thrown: 'System.Net.Http.HttpRequestException' in System.Private.CoreLib.dll Exception thrown: 'System.Net.Http.HttpRequestException' in Azure.DAL.dll Exception thrown: 'System.Net.Http.HttpRequestException' in System.Private.CoreLib.dll Exception thrown: 'System.Net.Http.HttpRequestException' in Azure.DAL.dll Exception thrown: 'System.Net.Http.HttpRequestException' in Azure.DAL.dll Exception thrown: 'System.Net.Http.HttpRequestException' in System.Private.CoreLib.dll Exception thrown: 'System.Net.Http.HttpRequestException' in System.Private.CoreLib.dll Exception thrown: 'System.Net.Http.HttpRequestException' in System.Private.CoreLib.dll Exception thrown: 'System.Net.Http.HttpRequestException' in System.Private.CoreLib.dll The thread 0x3eac has exited with code 0 (0x0). The thread 0x2294 has exited with code 0 (0x0). The thread 0x66b0 has exited with code 0 (0x0). The thread 0x6958 has exited with code 0 (0x0). The thread 0xb5c has exited with code 0 (0x0). The thread 0x6a98 has exited with code 0 (0x0). The thread 0x16e8 has exited with code 0 (0x0). The thread 0x28f8 has exited with code 0 (0x0).

How to make it possible to run more than 100 async HTTP request without having this problem, I have no control over the number of filters sent from the front but I need to be able to run a number of parallel async operation equivalent to the number of filters.

Edit : Official requested endpoint doc included.

https://dev.loganalytics.io/documentation/Using-the-API/Limits

The doc states that:

Using API key authentication to sample data, throttling rules are applied per client IP address. Each client IP address is able to make up to 200 requests per 30 seconds, with no cap on total calls per day.

Edit : GetPerformanceCounters implementation.

        public async Task<List<Entities.Metrics>> GetPerformanceCounters(string sourceComputerId, string workspaceId)
    {

        Debug.WriteLine("New Task lauched");


        // The log analytics query 
          var query = $@"Perf 
        | where SourceComputerId == '{sourceComputerId}'
        | summarize arg_max(TimeGenerated, *) by SourceComputerId";




        var response = await RemoteKustoProvider.RunWorkSpaceQueryAsync
                  (
                        workspace: workspaceId,
                        query: query
                 );

        Debug.WriteLine("Task Finished");

        // Send query to the Kusto-Engine. 
        return JsonConvert.DeserializeObject<List<Entities.Metrics>>
        (
           response
       );
    }

Thanks in advance for helping me to solve this issue.

回答1:

There's a number of issues here. First and foremost, you need to realize that all active work requires a thread. Those threads come from the thread pool, which is generally set to 1000 out of the box. That means right from the start, you're exhausting the thread pool with that many tasks, so likely 2000 of them are being queued. Async helps, but it's not magic. Some of the threads may end up returning the to the pool once the request goes out, but thread switches are not guaranteed. Regardless, you're still draining the thread pool faster than you can fill it.

Then, it's not clear what's actually happening here, because there's multiple layers and you're spoon-feeding us a layer at a time. You posted what GetPerformanceCounters does, but now it's not clear what RemoteKustoProvider.RunWorkSpaceQueryAsync does. At some point, you're using something like HttpClient, though, in order to eventually make the request to Azure. Depending on how you're doing that (and frankly, based on the quality of this code, I have no great hope you're doing it correctly), you're likely also exhausting the connection pool. That's a far more constricted resource, so that means your throughput is drastically reduced, causing further back up down the chain.

Next, even if you could just blast out all 3000 requests at once, that essentially amounts to a DDoS attack on the recipient server. I wouldn't be surprised at all if you're actually being throttled or outright blocked by firewalls on the other end. That then of course - you guessed it - bottlenecks everything else down the line as well.

Finally, even if all this function as you intended, firing off 3000 tasks is not the same thing as "parallel processing". For that you'd need to utilize the Parallel class from the TPL.

I'm not sure what you're ultimate goal is here, but there's surely a more efficient way to do it that sending 3000 separate requests. If there truly is not, then you need to batch them - sending a few a time over a longer period. The whole process should be moved off to an external process, and should be merely scheduled by your web application. You can then use something like SignalR to push updates and eventually the result back down to the user when it all completes.