All of a sudden getting lots of Wait Operation Tim

2019-02-10 01:28发布

问题:

Two days ago, with no code changes or changes to the DB, I am not getting a lot (every 5 minutes or so) of errors with The wait operation timed out error with two different underlining full errors on about the pre-login and the other about the post:

System.Data.Entity.Core.EntityException: The underlying provider failed on Open. ---> System.Data.SqlClient.SqlException: Connection Timeout Expired. The timeout period elapsed while attempting to consume the pre-login handshake acknowledgement. This could be because the pre-login handshake failed or the server was unable to respond back in time. The duration spent while attempting to connect to this server was - [Pre-Login] initialization=21; handshake=14988; ---> System.ComponentModel.Win32Exception: The wait operation timed out

System.Data.Entity.Core.EntityException: The underlying provider failed on Open. ---> System.Data.SqlClient.SqlException: Connection Timeout Expired. The timeout period elapsed during the post-login phase. The connection could have timed out while waiting for server to complete the login process and respond; Or it could have timed out while attempting to create multiple active connections. This failure occurred while attempting to connect to the routing destination. The duration spent while attempting to connect to the original server was - [Pre-Login] initialization=5; handshake=3098; [Login] initialization=0; authentication=0; [Post-Login] complete=7; The duration spent while attempting to connect to this server was - [Pre-Login] initialization=20; handshake=5; [Login] initialization=0; authentication=0; [Post-Login] complete=11003; ---> System.ComponentModel.Win32Exception: The wait operation timed out

I am using Entity Framework and my web site is hosted on an Azure Web App. I have done some digging and most SO questions I find about this are NOT related to Entity Framework but ADO.Net the few posts I found lead me updated from a Basic to Standard (S0) service for the DB and creating a GlobalDBConfig with

public class GlobalDBConfig : DbConfiguration
{
    public GlobalDBConfig()
    {
        SetExecutionStrategy("System.Data.SqlClient", () => new SqlAzureExecutionStrategy(2, TimeSpan.FromSeconds(30)));
    }
}

How can I figure out what else is going wrong and fix it? This is a very simple DB with simple queries and very little traffic to the site (less then 1000 visits a DAY)

回答1:

We resolved this issue, along with other types of random timeouts on SQL Azure by switching to "contained users". Using server-level logins on SQL Azure can cause issues:

This is not very efficient as in SQL DB master and user can sit on two different SQL servers potentially in two different machines. Also when a server has multiple user databases then master will be the bottleneck in the login process, and under load this may result in high response time for logins. If Microsoft is updating the software on the machine / server then master will be unavailable for a few seconds and all the logins to the user database can fail too at this time (http://www.sqlindepth.com/contained-users-in-sql-azure-db-v12/)

As in your case, I had my doubts because my database was not under heavy load, but switching to contained users made a tremendous difference anyway.

The SQL to create these users is as follows (run this on the database itself, not on the master database as you would for creating server-level logins):

Create user ContainedUser with password = 'Password'
ALTER AUTHORIZATION ON SCHEMA::[db_owner] TO [ContainedUser]
ALTER ROLE [db_owner] ADD MEMBER [ContainedUser]


回答2:

Here are a few options to try: I strongly recommend going with (1) and (3) if possible

  1. User database firewall rules and contained user authentication
  2. Increase connection timeout to a large value (60-120 seconds?)
  3. If possible update your client drivers to latest version (7.4 and above)


回答3:

We had similar problems and please note that there is no such thing as AUTO scaling on standalone databases on Azure and since you are using Entity Framework, here are some suggestions below

  1. If you are calling Web-API to fetch and transact with your database on Azure SQL, make sure on Azure portal you set the "ALLWAYS ON" option for the Web-API.

  2. Then your client app should probably retry if it fails to connect in the first attempt.

  3. If the database queries are resulting in timeouts due to volume of the data and the indexes not able to catch up with that, you will need to increase the time out of the command executions a bit and most importantly you will need to update the stats on the database and recompile all the objects in the database.