Service Broker messages start to get hung up after

2019-04-10 04:08发布

问题:

I have an application that is using the Service Broker is SQL 2008. About once a day the database's performance starts take a noticeable hit and I have determined that this is because of the Service Broker. If I hard reset all broker connections using the following commands:

ALTER DATABASE [RegencyEnterprise] SET OFFLINE WITH ROLLBACK IMMEDIATE
ALTER DATABASE [RegencyEnterprise] SET ONLINE

Then the performance returns to normal until about the next day. I have also noticed that when performance is poor, running the following query returns a large number (around 1000 currently) of conversations that are stuck in the STARTED_OUTBOUND state:

SELECT * FROM sys.conversation_endpoints

Also, the following queries don't return any entries in them:

SELECT * FROM sys.dm_qn_subscriptions
SELECT * FROM sys.transmission_queue

Performance seems to be alright where there are plenty of items returned by this query. The only time when there are problems are when there are connections that are STARTED_OUTBOUND that stay stuck in this state.

The only configuration I have done to the Service Broker on my SQL Server 2008 instance was to run the following command:

ALTER DATABASE RegencyEnterprise SET ENABLE_BROKER

Digging through the SQL error log, I have found this entry over 1000 times as well:

07/11/2013 01:00:02,spid27s,Unknown,The query notification dialog on conversation handle '{6DFE46F5-25E9-E211-8DC8-00221994D6E9}.' closed due to the following error: '<?xml version="1.0"?><Error xmlns="http://schemas.microsoft.com/SQL/ServiceBroker/Error"><Code>-8490</Code><Description>Cannot find the remote service &apos;SqlQueryNotificationService-cb4e7a77-58f3-4f93-95c1-261954d3385a&apos; because it does not exist.</Description></Error>'.

I also see this error a dozen or so times throughout the log, though I believe I can fix this just by creating a master key in the database:

06/26/2013 14:25:01,spid116,Unknown,Service Broker needs to access the master key in the database '<Database name>'. Error code:26. The master key has to exist and the service master key encryption is required.

I am thinking the number of these errors may be related to the number of conversations that are staying stuck in the queue. Here is the C# code I am using to subscribe to the query notifications:

private void EstablishSqlConnection(
    String storedProcedureName,
    IEnumerable<SqlParameter> parameters,
    Action sqlQueryOperation,
    String serviceCallName,
    Int32 timeout,
    params MultipleResult[] results)
{
    SqlConnection storeConnection = (SqlConnection) ((EntityConnection) ObjectContext.Connection).StoreConnection;
    try
    {
        using (SqlCommand command = storeConnection.CreateCommand())
        {
            command.Connection = storeConnection;
            storeConnection.Open();

            SqlParameter[] sqlParameters = parameters.ToArray();
            command.CommandText = storedProcedureName;
            command.CommandType = CommandType.StoredProcedure;
            command.Parameters.AddRange(sqlParameters);

            if (sqlQueryOperation != null)
            {
                // Register a sql dependency with the SQL query.
                SqlDependency sqlDependency = new SqlDependency(command, null, timeout);
                sqlDependency.OnChange += OnSqlDependencyNotification;
            }

            using (DbDataReader reader = command.ExecuteReader())
            {
                results.ForEach(result => result.MapResults(this, reader));
            }
        }
    }
    finally
    {
        storeConnection.Close();
    }
}

Here is how I handle the notification:

    public static void OnSqlDependencyNotification(object sender, SqlNotificationEventArgs e)
    {
        if (e.Info == SqlNotificationInfo.Invalid)
        {
            // If we failed to register the SqlDependency, log an error
            <Error is loged here...>

            // If we get here, we are not in a valid state to requeue the sqldependency. However,
            // we are on an async thread and should NOT throw an exception. Instead we just return
            // here, as we have already logged the error to the database. 
            return;
        }

        // If we are able to find and remove the listener, invoke the query operation to re-run the query.
        <Handle notification here...>
    }

Does anyone know what can cause the broker's connections to get in this state? Or what tools I could use to go about trying to figure out what is causing this? I currently only have a single web server that is registering to its notifications, so my scenario is not overly complex.

UPDATE:

Ok, so I have determined from this post that the error "Cannot find the remote service ... because it does not exist" is due to SqlDependency not cleaning up after itself properly. The broker is still trying to send notifications to my application after the service has ended. So now, it sounds like I just have to find a way to clear out whatever it is not properly cleaning up when my app starts before calling SqlDependency.Start(), but I have not found a way to do this other than my original method above, which takes the database offline and is not acceptable. Does anyone know know to clean this up?

回答1:

I have found an acceptable approach to solving this issue. First, I migrated my code away from SqlDependency and I am now using SqlNotificationRequest instead. Doing this prevents Broker Queues and Services from being created/destroyed at unexpected times.

Even with this however, when my application exits there are still a few conversations that don't get marked as closed because the original endpoint that setup the notification is no longer there. Therefore, each time my server re-initializes my code I am clearing out existing conversations.

This adjustment has reduced the number of connections that I have on a daily bases from over 1000 and having to manually kill them, to having a max of about 20 at all times. I highly recommend using SqlNotificationRequest instead of SqlDependency.



回答2:

I have found a way to clear out the conversations that are stuck. I retrieve all of the generated SqlDependency queues that still exist and iterate over the conversations that don't belong to any of these and end those conversations. Below is the code:

SET NOCOUNT OFF;
DECLARE @handle UniqueIdentifier
DECLARE @count INT = 0

-- Retrieve orphaned conversation handles that belong to auto-generated SqlDependency queues and iterate over each of them
DECLARE handleCursor CURSOR
FOR 
SELECT [conversation_handle]
FROM sys.conversation_endpoints WITH(NOLOCK)
WHERE
    far_service COLLATE SQL_Latin1_General_CP1_CI_AS like 'SqlQueryNotificationService-%' COLLATE SQL_Latin1_General_CP1_CI_AS AND
    far_service COLLATE SQL_Latin1_General_CP1_CI_AS NOT IN (SELECT name COLLATE SQL_Latin1_General_CP1_CI_AS FROM sys.service_queues)

DECLARE @Rows INT
SELECT @Rows = COUNT(*) FROM sys.conversation_endpoints WITH(NOLOCK)
WHERE
    far_service COLLATE SQL_Latin1_General_CP1_CI_AS like 'SqlQueryNotificationService-%' COLLATE SQL_Latin1_General_CP1_CI_AS AND
    far_service COLLATE SQL_Latin1_General_CP1_CI_AS NOT IN (SELECT name COLLATE SQL_Latin1_General_CP1_CI_AS FROM sys.service_queues)

WHILE @ROWS>0
BEGIN
    OPEN handleCursor

    FETCH NEXT FROM handleCursor 
    INTO @handle

    BEGIN TRANSACTION

    WHILE @@FETCH_STATUS = 0
    BEGIN

        -- End the conversation and clean up any remaining references to it
        END CONVERSATION @handle WITH CLEANUP

        -- Move to the next item
        FETCH NEXT FROM handleCursor INTO @handle
        SET @count= @count+1
    END

    COMMIT TRANSACTION
    print @count

    CLOSE handleCursor;

    IF @count > 100000
    BEGIN
        BREAK;
    END

    SELECT @Rows = COUNT(*) FROM sys.conversation_endpoints WITH(NOLOCK)
    WHERE
        far_service COLLATE SQL_Latin1_General_CP1_CI_AS like 'SqlQueryNotificationService-%' COLLATE SQL_Latin1_General_CP1_CI_AS AND
        far_service COLLATE SQL_Latin1_General_CP1_CI_AS NOT IN (SELECT name COLLATE SQL_Latin1_General_CP1_CI_AS FROM sys.service_queues)
END
DEALLOCATE handleCursor;


回答3:

Started Outbound means 'SQL Server processed a BEGIN CONVERSATION for this conversation, but no messages have yet been sent.' (from Books Online) It looks like you are creating conversations that are not then being used, so they never get closed.

Not entirely sure why that would be causing a degradation in performance though.