Paralell.ForEach with HttpClient and ContinueWith

2019-02-14 18:39发布

问题:

I have a method that attempts to download data from several URLs in Parallel, and return an IEnumerable of Deserialized types

The method looks like this:

    public IEnumerable<TContent> DownloadContentFromUrls(IEnumerable<string> urls)
    {
        var list = new List<TContent>();

        Parallel.ForEach(urls, url =>
        {
            lock (list)
            {
                _httpClient.GetAsync(url).ContinueWith(request =>
                {
                    var response = request.Result;
                    //todo ensure success?

                    response.Content.ReadAsStringAsync().ContinueWith(text =>
                    {
                        var results = JObject.Parse(text.Result)
                            .ToObject<IEnumerable<TContent>>();

                        list.AddRange(results);
                    });
                });
            }
        });

        return list;
    }

In my unit test (I stub out _httpClient to return a known set of text) I basically get

Sequence contains no elements

This is because the method is returning before the tasks have completed.

If I add .Wait() on the end of my .ContinueWith() calls, it passes, but I'm sure that I'm misusing the API here...

回答1:

If you want a blocking call which downloads in parallel using the HttpClient.GetAsync method then you should implement it like so:

public IEnumerable<TContent> DownloadContentFromUrls<TContent>(IEnumerable<string> urls)
{
    var queue = new ConcurrentQueue<TContent>();

    using (var client = new HttpClient())
    {
        Task.WaitAll(urls.Select(url =>
        {
            return client.GetAsync(url).ContinueWith(response =>
            {
                var content = JsonConvert.DeserializeObject<IEnumerable<TContent>>(response.Result.Content.ReadAsStringAsync().Result);

                foreach (var c in content)
                    queue.Enqueue(c);
            });
        }).ToArray());
    }

    return queue;
}

This creates an array of tasks, one for each Url, which represents a GetAsync/Deserialize operation. This is assuming that the Url returns a Json array of TContent. An empty array or a single member array will deserialize fine, but not a single array-less object.