Dynamically change proxy in HttpClient without har

2020-06-28 02:10发布

I need to create a multithreaded application which makes requests (Post, get etc) For this purpose i chose Httpclient.

By default it does not support Socks proxies. So I find Sockshandler (https://github.com/extremecodetv/SocksSharp) can be used instead of basic HttpClientHandler. It allows me to use socks.

But I have a problem. All my requests should be send through different proxies which I have parsed from the internet. But httpclient handler doesn't support changing proxies dynamically. If I don't have valid proxy, I need to recreate a httclient, this is ok, but if I have 200 threads, it takes a lot of cpu. So what should I do in this situation?

And second problem. I found this article (https://aspnetmonsters.com/2016/08/2016-08-27-httpclientwrong/) which talks to use HttpClient as a single instance to better performance, but it's impossible in multithreaded program. Which way is better in this case?

Thx for help

3条回答
放荡不羁爱自由
2楼-- · 2020-06-28 02:48

httpclient handler doesn't support changing proxies dynamically.

I'm not sure if that's technically true. Proxy is a read/write property so I believe you could change it (unless that results in a runtime error...I haven't actually tried it to be honest).

UPDATE: I have tried it now and your assertion is technically true. In the sample below, the line that updates UseProxy will fail with "System.InvalidOperationException: 'This instance has already started one or more requests. Properties can only be modified before sending the first request.'" Confirmed on .NET Core and full framework.

var hch = new HttpClientHandler { UseProxy = false };
var hc = new HttpClient(hch);
var resp = await hc.GetAsync(someUri);

hch.UseProxy = true; // fail!
hch.Proxy = new WebProxy(someProxy);
resp = await hc.GetAsync(someUri);

But what is true is that you can't set a different property per request in a thread-safe way, and that's unfortunate.

if I have 200 threads, it takes a lot of cpu

Concurrent asynchronous HTTP calls should not consume extra threads nor CPU. Fire them off using await Task.WhenAll or similar and there is no thread consumed until a response is returned.

And second problem. I found this article...

That's definitely something you need to look out for. However, even if you could set a different proxy per request, the underlying network stack would still need to open a socket for each proxy, so you wouldn't be gaining anything over an HttpClient instance per proxy in terms of the socket exhaustion problem.

The best solution depends on just how many proxies you're talking about here. In the article, the author describes running into problems when the server hit around 4000-5000 open sockets, and no problems around 400 or less. YMMV, but if the number of proxies is no more than a few hundred, you should be safe creating a new HttpClient instance per proxy. If it's more, I would look at throttling your concurrency and test it until find a number where your server resources can keep up. In any case, make sure that if you need to make multiple calls to the same proxy, you're re-using HttpClient instances for them. A ConcurrentDictionary could be useful for lazily creating and reusing those instances.

查看更多
SAY GOODBYE
3楼-- · 2020-06-28 03:06

With some testing, I confirmed that you can change proxy by Address property of WebProxy. The trick is you have to initiate a http request before you switch to another proxy. Here is the sample code:

    private static async Task CommonHttpClient(List<string> proxyList)
    {
        var webproxy = new WebProxy("http://8.8.8.8:8080", false);
        var handler = new HttpClientHandler()
        {
            Proxy = webproxy,
            UseProxy = true,
        };
        var client = new HttpClient(handler) {Timeout = NetworkUtils.AcceptableTimeoutTimeSpan};
        var data = new Dictionary<Task<HttpResponseMessage>, string>();
        foreach (var proxy in proxyList)
        {
            webproxy.Address = new Uri($"http://{proxy}");
            var uri = new Uri(
                "https://api.ipify.org");
            data.Add(client.GetAsync(uri, HttpCompletionOption.ResponseHeadersRead), proxy);
        }

        while (data.Count > 0)
        {
            var taskFinished = await Task.WhenAny(data.Keys).ConfigureAwait(false);
            var address = data[taskFinished];
            using var resp = await taskFinished.ConfigureAwait(false);
            resp.EnsureSuccessStatusCode();
            var ip = await resp.Content.ReadAsStringAsync().ConfigureAwait(false);
            Assert.Equals(address, ip);
            data.Remove(taskFinished);
        }

        handler.Dispose();
        client.Dispose();
    }
    private static async Task SeperateHttpClient(List<string> proxyList)
    {
        await Task.WhenAll(proxyList.Select(async proxy =>
        {
            var webproxy = new WebProxy($"http://{proxy}", false);
            using var handler = new HttpClientHandler()
            {
                Proxy = webproxy,
                UseProxy = true,
            };
            using var client = new HttpClient(handler) {Timeout = NetworkUtils.AcceptableTimeoutTimeSpan};
            var uri = new Uri("https://api.ipify.org");
            var resp = await client.GetAsync(uri).ConfigureAwait(false);
            resp.EnsureSuccessStatusCode();
            var ip = await resp.Content.ReadAsStringAsync().ConfigureAwait(false);
            Assert.Equals(proxy, ip);

        })).ConfigureAwait(false);
    }

    private static async Task TestAsync1()
    {
        // Your list of proxy
        var proxyList = new List<string>() {"1.2.3.4", "5.6.7.8"};
        
        var start = DateTimeOffset.UtcNow;
        await SeperateHttpClient(proxyList).ConfigureAwait(false);
        Console.WriteLine(start.TotalSecondsSince());

        start = DateTimeOffset.UtcNow;
        await CommonHttpClient(proxyList).ConfigureAwait(false);
        Console.WriteLine(start.TotalSecondsSince());
        
    }

During my testing, I don't see sharing one HttpClient instance boosts the performance. It even takes longer time to complete, even though it has a more optimized code (i.e. using ResponseHeaderRead (https://www.stevejgordon.co.uk/using-httpcompletionoption-responseheadersread-to-improve-httpclient-performance-dotnet))

查看更多
Ridiculous、
4楼-- · 2020-06-28 03:08

I agree with Todd Menier's answer. But if you use .Net core I suggest to read this and this articles where Microsoft says:

Instantiating an HttpClient class for every request will exhaust the number of sockets available under heavy loads. That issue will result in SocketException errors.

It's sad, but they provide a solution:

To address those mentioned issues and make the management of HttpClient instances easier, .NET Core 2.1 introduced a new HttpClientFactory that can also be used to implement resilient HTTP calls by integrating Polly with it.

I looked at IHttpClientFactory summary block and see that:

Each call to System.Net.Http.IHttpClientFactory.CreateClient(System.String) is guaranteed to return a new System.Net.Http.HttpClient instance. Callers may cache the returned System.Net.Http.HttpClient instance indefinitely or surround its use in a using block to dispose it when desired. The default System.Net.Http.IHttpClientFactory implementation may cache the underlying System.Net.Http.HttpMessageHandler instances to improve performance. Callers are also free to mutate the returned System.Net.Http.HttpClient instance's public properties as desired.

Let's look at picture enter image description here

IHttpClientFactory implementation injecting into some service (CatalogueService or whatever you made) and then HttpClient instantiated via IHttpClientFactory every time when you need to make request (you can even wrap it into using(...) block), but HttpMessageHandler will be cached in some kind of connection pool.

So you can use HttpClientFactory to create so many HttpClient instances as you need and set proxy before you make call. I'd be glad if it helps.

UPDATE: I tried it out and it not actually what you need. You can implement own IHttpClientFactory like this:

public class Program
{
    public interface IHttpClientFactory
    {
        HttpClient CreateClientWithProxy(IWebProxy webProxy);
    }

    internal class HttpClientFactory : IHttpClientFactory
    {
        private readonly Func<HttpClientHandler> makeHandler;

        public HttpClientFactory(Func<HttpClientHandler> makeHandler)
        {
            this.makeHandler = makeHandler;
        }

        public HttpClient CreateClientWithProxy(IWebProxy webProxy)
        {
            var handler = this.makeHandler();
            handler.Proxy = webProxy;
            return new HttpClient(handler, true);
        }
    }

    internal class CachedHttpClientFactory : IHttpClientFactory
    {
        private readonly IHttpClientFactory httpClientFactory;
        private readonly Dictionary<int, HttpClient> cache = new Dictionary<int, HttpClient>();

        public CachedHttpClientFactory(IHttpClientFactory httpClientFactory)
        {
            this.httpClientFactory = httpClientFactory;
        }

        public HttpClient CreateClientWithProxy(IWebProxy webProxy)
        {
            var key = webProxy.GetHashCode();
            lock (this.cache)
            {
                if (this.cache.ContainsKey(key))
                {
                    return this.cache[key];
                }

                var result = this.httpClientFactory.CreateClientWithProxy(webProxy);
                this.cache.Add(key, result);
                return result;
            }
        }
    }

    public static void Main(string[] args)
    {
        var httpClientFactory = new HttpClientFactory(() => new HttpClientHandler
        {
            UseCookies = true,
            UseDefaultCredentials = true,
        });

        var cachedhttpClientFactory = new CachedHttpClientFactory(httpClientFactory);
        var proxies = new[] {
            new WebProxy()
            {
                Address = new Uri("https://contoso.com"),
            },
            new WebProxy()
            {
                Address = new Uri("https://microsoft.com"),
            },
        };

        foreach (var item in proxies)
        {
            var client = cachedhttpClientFactory.CreateClientWithProxy(item);
            client.GetAsync("http://someAddress.com");
        }
    }
}

But be careful with large collections of WebProxy that can occupy all connections in pool.

查看更多
登录 后发表回答