Azure Web Role Stress Test - 1000ms blocking opera

2019-06-03 15:35发布

问题:

Today, I wanted to simulate waiting for a long-running blocking process (5 to 30 seconds) from within an AsyncController in an MVC3 web role. However, to begin, I just started with 1 second, to get things going. Yes, the wisdom of this is questionable, since the blocking operation cannot currently be run asynchronously on an I/O Completion Port to an external service, but I wanted to see what the performance limit is for this particular situation.

In my web role, I deployed 6 small instances. The only controller was an AsyncController, with two simple methods intended to simulate a 1000ms blocking operation.

The MVC3 web role controller was simply this:

public class MessageController : AsyncController
{
    public void ProcessMessageAsync(string id)
    {
        AsyncManager.OutstandingOperations.Increment();
        Task.Factory.StartNew(() => DoSlowWork());
    }

    public ActionResult ProcessMessageCompleted()
    {
        return View("Message");
    }

    private void DoSlowWork()
    {
        Thread.Sleep(1000);
        AsyncManager.OutstandingOperations.Decrement();
    }
}

Next, I applied stress to the web role from Amazon EC2. Using 12 servers, I ramped the load up slowly, and got close to 550 requests/second. Any attempt to push beyond this was met with apparent thread starvation and subsequent errors. I assume that we were hitting the CLR thread limit, which I understand to be 100 threads per CPU. Figuring some overhead for the AsyncController, and an average of 550/6 = 92 requests per second per server for a 1000ms blocking operation seems to fit that conclusion.

Is this for real? I have seen other people say similar things, where they reached 60 to 80 requests per second per instance with this type of load. The load on this system will be comprised mainly of longer-running operations, so 92 requests per second at 1000ms is going way down when the 5000ms tasks come online.

Short of routing the requests for the blocking I/O through multiple separate web role front ends to fan this load out to more cores, is there any way to get higher than this apparent limit of 90 or so requests per second at 1000ms block time? Have I made some kind of obvious error here?

回答1:

I'm sorry I have to say this buy you have been mislead by all the blogs claiming that by simply using Task.Factory.StartNew is the solution to all your problems, well, it's not.

Load test with Task.Factory.StartNew

Take a look on the following load test I did on your code (I changed the sleep to 10 sec. instead of 1 sec. to make it even worse). The test simulates 200 constant users doing a total of 2500 requests. And look at how many failed requests there are due to thread starvation:

As you can see, even if you're using an AsyncController with a Task, thread starvation is still happening. Could it be caused because of the long running process?

Load test with TaskCreationOptions.LongRunning

Did you know you can specify if a task is long running or not? Take a look at this question: Strange Behavior When I Don't Use TaskCreationOptions.LongRunning

When you don't use the LongRunning flag, the task is scheduled on a threadpool thread, not its own (dedicated) thread. This is likely the cause of your behavioral change - when you're running without the LongRunning flag in place, you're probably getting threadpool starvation due to other threads in your process.

Let's see what happens if we change 1 line of code:

    public void ProcessMessageAsync(string id)
    {
        Task.Factory.StartNew(DoSlowWork, TaskCreationOptions.LongRunning);
        AsyncManager.OutstandingOperations.Increment();
    }

Take a look at the load test, what a difference!

What just happened?

As you can see, the LongRunning option seems to make a big difference. Let's add some logging to see what happens internally:

    public void ProcessMessageAsync(string id)
    {
        Trace.WriteLine(String.Format("Before async call - ThreadID: {0} | IsBackground: {1} | IsThreadPoolThread: {2} | Priority: {3} | ThreadState: {4}", Thread.CurrentThread.ManagedThreadId, Thread.CurrentThread.IsBackground,
            Thread.CurrentThread.IsThreadPoolThread, Thread.CurrentThread.Priority, Thread.CurrentThread.ThreadState));
        Task.Factory.StartNew(DoSlowWork, TaskCreationOptions.LongRunning);
        AsyncManager.OutstandingOperations.Increment();
    }

    ...

    private void DoSlowWork()
    {
        Trace.WriteLine(String.Format("In async call - ThreadID: {0} | IsBackground: {1} | IsThreadPoolThread: {2} | Priority: {3} | ThreadState: {4}", Thread.CurrentThread.ManagedThreadId, Thread.CurrentThread.IsBackground,
               Thread.CurrentThread.IsThreadPoolThread, Thread.CurrentThread.Priority, Thread.CurrentThread.ThreadState)); 
        Thread.Sleep(10000);
        AsyncManager.OutstandingOperations.Decrement();
    }

Without LongRunning:

Before async call - ThreadID: 11 | IsBackground: True | IsThreadPoolThread: True | Priority: Normal | ThreadState: Background
Async call - ThreadID: 11 | IsBackground: True | IsThreadPoolThread: True | Priority: Normal | ThreadState: Background

With LongRunning:

Before async call - ThreadID: 48 | IsBackground: True | IsThreadPoolThread: True | Priority: Normal | ThreadState: Background
Async call - ThreadID: 48 | IsBackground: True | IsThreadPoolThread: False | Priority: Normal | ThreadState: Background

As you can see, without LongRunning you are actually using threads from the thread pool, causing the starvation. While the LongRunning option works great in this case, you should always evaluate if you really need it.

Note: Since you're using Windows Azure, you need to take into account that the load balancer will timeout after a few minutes of inactivity.