Timeout exception after async commands and Task.Wh

2019-01-23 18:03发布

问题:

I'm experiencing the so-called Timeout performing HGET company:product:settings, inst: 1, queue: 8, qu=0, qs=8, qc=0, wr=0/0, in=79/1 timeout exception.

It's strange because the same Redis instance and in the same machine is storing data, but it's a specific application which throws this exception. Update: In fact, the same application, one line above receives data from Redis. The issue is with HGET.

Also, I've increased timeouts on multiplexer configuration to 6 seconds with no luck.

In addition, I've checked that IDatabase instance has IsConnected with true value.

How to interpret these error messages and what's the issue behind the whole timeout?

Some background...

I've successfully fixed the issue changing when some code section gets a database (i.e. multiplexer.GetDatabase()).

While multiplexer has an instance per AppDomain as described in StackExchange.Redis documentation, many inversion of control components are creating many instances of IDatabase in their own code. That is, IDatabase instance isn't shared.

Actual code is performing a ListRightPopLeftPush, and after that, it's instantiating an inversion of control component which reads a hash key during component instatiation. If instantiate the whole component before doing the so-called ListRightPopLeftPush, then the whole HashGet won't throw the timeout exception.

It seems like even when ListRightPopLeftPush is executed from other IDatabase instance, it produces some kind of issue in the next IDatabase instance when it comes to perform read operations.

Anyway, my fix doesn't answer the question. I've just added more detailed info to let us find what's the issue and its own solution.

Update

Anyway, the above "fix" won't fix further read accesses to Redis. I'm getting the same timeout exception in further calls. And now in paramater found in exception's message says 60/1.

回答1:

Based on a long discussion in chat, and a lot of digging, it looks like in some obscure scenarios the TPL is hijacking the dedicated reader thread when we are doing things like .TrySetResult (which: we do often). This causes an instant deadlock if you make a synchronous call, since it can't possibly process any socket data if it is busy waiting for a task to complete (which would only ever be completed by itself). We do actually have code in place specifically to prevent this, but it looks like the workaround actually forces it to happen in some other scenarios. Which... is horrible. I will see what I can find. But basically, the problem is that currently, in some limited scenarios, TaskCompletionSource.TrySetResult is giving power to the TPL to run synchronous continuations. This includes Task.WhenAny.