Azure: .NET Framework (with EF code first) and Mic

2020-07-24 05:48发布

问题:

I have an application running in Azure. It has a .NET framework API deployed as an App Service, connected to a Microsoft Sql Server. The .NET framework app uses Entity Framework Code First to access the database.

Usually, the application runs smoothly. However, from time to time, one single endpoint becomes unresponsive (i.e. takes minutes to return an answer). This endpoint is doing a query, through EntityFramework, and retrieving just a few data points from the database. This is the code, executed for 2 to 8 sensors (does not make any difference):

  List<Measurement> result = CreateContext().Measurements
                .Where(m => m.MeasurementSeries.SensorInfoID == sensorId)
                .Where(m => m.MeasurementSeries.StartTime <= toTime && m.MeasurementSeries.EndTime >= fromTime)
                .Where(m => m.Time > fromTime && m.Time <= toTime)
                .OrderByDescending(m => m.Time)
                .ToList();

By looking at the metrics (from ApplicationInsight, too) I see no weird query being executed. I see the expected query, and the database compute utilization jumping to 100%.

The same query, executed from the same code, connected to the same remote database, but with the .NET Framework app running locally, takes less than a second and less than 1% compute from the database. When the query (eventually) finishes, the database compute utilization goes back to zero. The API compute utilization is very low, always. Executing the same SQL query directly to the database takes less than a second.

All other endpoints, some of which make heavier queries to the database, work fine.

I tried re-running the same query multiple times and from multiple sources, restarting, stopping and starting, re-deploying the application. Nothing happens. Last time, after half a day, it just started working again. This time it's been over a day, and nothing.

Do you have any pointer to where to look for the issue? If there are more information I can provide, please let me know, because I may have not thought about them and it can help me figuring out the problem, too.

回答1:

Thanks to @PrebenHuybrechts for providing the tools to solve it, in the question's comments. As suggested, for questions like this one, it's good to check locks, wait stats and execution plan.

TL;DR

Max DOP was set to infinite, and considering I am running Sql Server as serverless with scaling from 2 to 16 vCores, from time to time it did overkill parallelization for a very small query.

The following tools were useful for debugging: sp_whoisactive, sp_BlitzWho.

Long(er) version

First, I used sp_whoisactive to see what was happening while waiting for an answer from the API. I called the endpoint, and run

EXEC sp_WhoIsActive
    @get_task_info = 2

In the results, I found the incriminated query, hanging, with wait_info status of CXPACKET. From this article I figured that CXPACKET means that the master process is waiting for slaves to finish processing, and that Microsoft suggests to set Max DOP (Maximum Degree of Parallelism) to 8. I did that using Microsoft SSMS and it worked like a charm.

I guess that, considering that the same query was used somewhere else to retrieve a bigger amount of data, it was optimizing it for that scenario, making it sadly slow for when the query was retrieving just few datapoints.