SignalR client disconnect causes high thread usage

2019-07-27 07:41发布

问题:

We have a server and clients that are implemented to communicate via signalr. We have implemented scaleout using SQL server and then later with Redis. When connecting more than around 70 clients to the server all the clients are suddenly gets disconnected. Following is the trace logs from the clients.

08:45:09.5481482 - 41219ddc-af97-4738-b75e-07fe6bbd5472 - WS: OnMessage({})
08:45:19.5763845 - 41219ddc-af97-4738-b75e-07fe6bbd5472 - WS: OnMessage({})
08:45:29.5868786 - 41219ddc-af97-4738-b75e-07fe6bbd5472 - WS: OnMessage({})
08:45:36.5945700 - 41219ddc-af97-4738-b75e-07fe6bbd5472 - WS: OnMessage({"C":"s-0,6052","M":[{"H":"PrintConnectorHub","M":"IsConnectionAlive","A":[true]}]})
08:45:36.6035962 - 41219ddc-af97-4738-b75e-07fe6bbd5472 - WS: OnMessage({"I":"96"})
08:45:39.6197277 - 41219ddc-af97-4738-b75e-07fe6bbd5472 - WS: OnMessage({})
08:45:53.2729622 - 41219ddc-af97-4738-b75e-07fe6bbd5472 - Connection Timeout Warning : Notifying user
08:45:53.2729622 - 41219ddc-af97-4738-b75e-07fe6bbd5472 - OnConnectionSlow
08:45:59.9651515 - 41219ddc-af97-4738-b75e-07fe6bbd5472 - Connection Timed-out : Transport Lost Connection
08:45:59.9651515 - 41219ddc-af97-4738-b75e-07fe6bbd5472 - WS: LostConnection
08:45:59.9801625 - 41219ddc-af97-4738-b75e-07fe6bbd5472 - WS: OnClose()
08:45:59.9821639 - 41219ddc-af97-4738-b75e-07fe6bbd5472 - ChangeState(Connected, Reconnecting)
08:46:00.0612219 - 41219ddc-af97-4738-b75e-07fe6bbd5472 - WS Connecting to: ws://172.20.22.58/signalr/reconnect?clientProtocol=1.4&transport=webSockets&connectionData=[{"Name":"PrintConnectorHub"}]&connectionToken=GTySuvds2p5gtlb3q2ul1JijPaQVJf42sv8fWFXea8OqlLWoDv1rSK%2BfDKUowaqJ4Nx9mAb6NAIkHwTpgNiJNjdSO3lr73qA7JyP%2B6Uxc2uVhhWUV8My5wRN5utsoF0k&messageId=s-0%2C6052
08:46:06.5803795 - 41219ddc-af97-4738-b75e-07fe6bbd5472 - OnError(System.InvalidOperationException: Data cannot be sent because the WebSocket connection is reconnecting.)
08:46:30.0342252 - 41219ddc-af97-4738-b75e-07fe6bbd5472 - OnError(System.TimeoutException: Couldn't reconnect within the configured timeout of 00:00:30, disconnecting.)
08:46:30.0342252 - 41219ddc-af97-4738-b75e-07fe6bbd5472 - Disconnected
08:46:30.0352263 - 41219ddc-af97-4738-b75e-07fe6bbd5472 - Transport.Dispose(41219ddc-af97-4738-b75e-07fe6bbd5472)
08:46:30.0362276 - 41219ddc-af97-4738-b75e-07fe6bbd5472 - Closed
08:46:36.5536103 - null - ChangeState(Disconnected, Connecting)
08:48:05.8383180 - b4995d7f-c2bd-469a-a54b-4df02f198b9b - WS Connecting to: ws://172.20.22.58/signalr/connect?clientProtocol=1.4&transport=webSockets&connectionData=[{"Name":"PrintConnectorHub"}]&connectionToken=iPWCjYXC%2B81rTYDgV%2F4nFRmaU6ZQf56Y3Q8bRwHDbagwydw2dknALH8Lz6T0U8ISnwAycFQPEt5efnR5%2Bu%2FASvM%2B7AUX7%2FNB4eqPJrOpf7xxsvn4t%2BO2iLM3loZ9dUt5
08:48:10.8539527 - b4995d7f-c2bd-469a-a54b-4df02f198b9b - Auto: Failed to connect to using transport webSockets. System.TimeoutException: Transport timed out trying to connect
08:48:10.8539527 - b4995d7f-c2bd-469a-a54b-4df02f198b9b - WS: OnClose()
08:48:10.8549540 - b4995d7f-c2bd-469a-a54b-4df02f198b9b - SSE: GET http://172.20.22.58/signalr/connect?clientProtocol=1.4&transport=serverSentEvents&connectionData=[{"Name":"PrintConnectorHub"}]&connectionToken=iPWCjYXC%2B81rTYDgV%2F4nFRmaU6ZQf56Y3Q8bRwHDbagwydw2dknALH8Lz6T0U8ISnwAycFQPEt5efnR5%2Bu%2FASvM%2B7AUX7%2FNB4eqPJrOpf7xxsvn4t%2BO2iLM3loZ9dUt5
08:48:15.8721679 - b4995d7f-c2bd-469a-a54b-4df02f198b9b - Auto: Failed to connect to using transport serverSentEvents. System.TimeoutException: Transport timed out trying to connect
08:48:15.8751701 - b4995d7f-c2bd-469a-a54b-4df02f198b9b - LP Connect: http://172.20.22.58/signalr/connect?clientProtocol=1.4&transport=longPolling&connectionData=[{"Name":"PrintConnectorHub"}]&connectionToken=iPWCjYXC%2B81rTYDgV%2F4nFRmaU6ZQf56Y3Q8bRwHDbagwydw2dknALH8Lz6T0U8ISnwAycFQPEt5efnR5%2Bu%2FASvM%2B7AUX7%2FNB4eqPJrOpf7xxsvn4t%2BO2iLM3loZ9dUt5
08:48:20.8729482 - b4995d7f-c2bd-469a-a54b-4df02f198b9b - Auto: Failed to connect to using transport longPolling. System.TimeoutException: Transport timed out trying to connect
08:48:20.8739493 - b4995d7f-c2bd-469a-a54b-4df02f198b9b - Disconnected
08:48:20.8739493 - b4995d7f-c2bd-469a-a54b-4df02f198b9b - Transport.Dispose(b4995d7f-c2bd-469a-a54b-4df02f198b9b)
08:48:20.8739493 - b4995d7f-c2bd-469a-a54b-4df02f198b9b - Closed
08:48:20.8749497 - null - There has not been an active server connection for an extended period of time. Stopping connection.
08:48:20.8749497 - null - OnError(System.TimeoutException: The client has been inactive since 5/8/2017 8:46:28 AM and it has exceeded the inactivity timeout of 00:00:50. Stopping the connection.)

What makes the problem worse is that that when we look in to the server machine resource monitor we can see that w3wp.exe has more than 1000 threads allocated and it keeps getting increased.

When we investigate on worker process in IIS huge number of worker process current requets are shown.

In one occasion IIS threads were gone up to 10,000 threads and hangs the server machine. What will be the cause for this issue?

IIS 8.0 Server - Windows 2012