Azure database concurrent usage issues

2020-05-23 15:49发布

问题:

Just wanted to run this by you all to see if there are any bright ideas as I have exhausted all of my ideas after an entire day, night and morning of searching. The issues we’re encountering invariably centre around database connectivity when under concurrent usage (selenium test), e.g. timeouts, dropped/closed connections, database server unreachable.

The issue does seem to be restricted to Azure as we’re yet to encounter the issue locally even when running the same selenium test on the same code pointing to the same database (SQL Azure) so it would point to it being some issue with outbound database connectivity in SQL Azure. So far we’ve tried the following:

  1. Azure transient fault handling – We have retry logic in place for when there is a temporary issue with the SQL Azure service itself.
  2. Change communication protocol – We’ve tried both TCP and Named Pipes and we encounter the same issue with both.
  3. Adjust database connection timeout interval – We’ve tried increasing this to no avail.
  4. Addition of Multiple Active Result Sets – We’ve add this to the connection string to no avail.
  5. Connection state check on every query – When we return the DataContext we check for its connection and reopen where necessary.
  6. Turned off connection pooling - We’ve also attempted this without success.
  7. Changed design pattern – We even went to the lengths of implementing a Unit of Work design pattern, where the database connections were being fired up and disposed of after every unit of work but this caused issues elsewhere with lazy loading, passing objects into methods and it would have been too substantial a rework at this point.

  8. Change role size – The last thing I can think to try is to up the role size in case of any implicit connection restrictions in Windows Azure – that’s currently deploying so there’s still half a chance it might work!

The site infrastructure is as follows:

  • DataContext class (extends DbContext) which is a Code First EF context.
  • BusinessLayer class contains a private, non-static DataContext. DataContext is constructor injected into each Manager/Helper class.
  • BusinessLayerService class contains a public, thread static BusinessLayer instance.
  • MVC site uses BusinessLayerService.Instance to access the manager classes which query and update the DataContext they’ve been passed.

Any help would be greatly appreciated.

UPDATE: We upped the VM size to Medium and all it did was mean that the same issue took longer to occur.

When the issues started occuring, a team member noted the following exception occured:

InvalidOperationException: Execution of the command requires an open and available connection. The connection's current state is broken.

This started occuring whenever the database was being hit (wasn't specific to a certain area of code).

回答1:

I've encountered this kind of issue before. In my case it was to do with Entity Framework ObjectContexts not being disposed of properly, before eventually too many contexts were spun up and the site toppled over. We identified using the Entity Framework Profiler that there were a LOT of unclosed ObjectContexts when errors were being thrown.

It wasn't feasible for us to move to a Unit of Work design pattern (or similar) as it would have required a rewrite of the business layer, therefore, we resolved it by closing the ObjectContexts manually after every page request. We adopted the approach of disposing the context manually in the End Request event of Global.asax, however, other valid approaches would have been to store the context in the HttpContext or implement an IoC container (such as Castle Windsor) with a "per request" lifestyle.