HttpClient.SendAsync using the thread-pool instead

2020-02-01 04:11发布

问题:

So I've been digging up on the implementation of HttpClient.SendAsync via Reflector. What I intentionally wanted to find out was the flow of execution of these methods, and to determine which API gets called to execute the asynchronous IO work.

After exploring the various classes inside HttpClient, I saw that internally it uses HttpClientHandler which derives from HttpMessageHandler and implements its SendAsync method.

This is the implementation of HttpClientHandler.SendAsync:

protected internal override Task<HttpResponseMessage> SendAsync(HttpRequestMessage request, CancellationToken cancellationToken)
{
    if (request == null)
    {
        throw new ArgumentNullException("request", SR.net_http_handler_norequest);
    }

    this.CheckDisposed();
    this.SetOperationStarted();

    TaskCompletionSource<HttpResponseMessage> source = new TaskCompletionSource<HttpResponseMessage>();

    RequestState state = new RequestState 
    {
        tcs = source,
        cancellationToken = cancellationToken,
        requestMessage = request
    };

    try
    {
        HttpWebRequest request2 = this.CreateAndPrepareWebRequest(request);
        state.webRequest = request2;
        cancellationToken.Register(onCancel, request2);

        if (ExecutionContext.IsFlowSuppressed())
        {
            IWebProxy proxy = null;

            if (this.useProxy)
            {
                proxy = this.proxy ?? WebRequest.DefaultWebProxy;
            }
            if ((this.UseDefaultCredentials || (this.Credentials != null)) || ((proxy != null) && (proxy.Credentials != null)))
            {
                this.SafeCaptureIdenity(state);
            }
        }

        Task.Factory.StartNew(this.startRequest, state);
    }
    catch (Exception exception)
    {
        this.HandleAsyncException(state, exception);
    }
    return source.Task;
}

What I found weird is that the above uses Task.Factory.StartNew to execute the request while generating a TaskCompletionSource<HttpResponseMessage> and returning the Task created by it.

Why do I find this weird? well, we go on alot about how I/O bound async operations have no need for extra threads behind the scenes, and how its all about overlapped IO.

Why is this using Task.Factory.StartNew to fire an async I/O operation? this means that SendAsync isn't only using pure async control flow to execute this method, but spinning a ThreadPool thread "behind our back" to execute its work.

回答1:

this.startRequest is a delegate that points to StartRequest which in turn uses HttpWebRequest.BeginGetResponse to start async IO. HttpClient is using async IO under the covers, just wrapped in a thread-pool Task.

That said, note the following comment in SendAsync

// BeginGetResponse/BeginGetRequestStream have a lot of setup work to do before becoming async
// (proxy, dns, connection pooling, etc).  Run these on a separate thread.
// Do not provide a cancellation token; if this helper task could be canceled before starting then 
// nobody would complete the tcs.
Task.Factory.StartNew(startRequest, state);

This works around a well-known problem with HttpWebRequest: Some of its processing stages are synchronous. That is a flaw in that API. HttpClient is avoiding blocking by moving that DNS work to the thread-pool.

Is that good or bad? It is good because it makes HttpClient non-blocking and suitable for use in a UI. It is bad because we are now using a thread for long-running blocking work although we expected to not use threads at all. This reduces the benefits of using async IO.

Actually, this is a nice example of mixing sync and async IO. There is nothing inherently wrong with using both. HttpClient and HttpWebRequest are using async IO for long-running blocking work (the HTTP request). They are using threads for short-running work (DNS, ...). That's not a bad pattern in general. We are avoiding most blocking and we only have to make a small part of the code async. A typical 80-20 trade-off. It is not good to find such things in the BCL (a library) but in application level code that can be a very smart trade-off.

It seems it would have been preferable to fix HttpWebRequest. Maybe that is not possible for compatibility reasons.