ProxySelector changes URL's scheme from https:

2019-02-19 08:08发布

问题:

I need to access Facebook but all outgoing communication is blocked on our server so I have to use proxy.

I initialize proxies with:

ProxySelector.setDefault(new ConfigurableProxySelector(mapping));

Proxy type is HTTP, proxy host and port are working (confirmed by simple wget test).

I'm trying to do this:

HttpClient httpClient = new HttpClient();
HttpMethod method = new GetMethod("https://graph.facebook.com:443");

int status = httpClient.executeMethod(method);

Now, in my class ConfigurableProxySelector I have select method on which I have breakpoint:

public List<Proxy> select(URI uri) {
...
}

So, using HttpClient I make an request, which should be proxied and code stops at breakpoint in select() method in ConfigurableProxySelector.

But what is strange is that uri.scheme = "socket" and .toString() gives "socket://graph.facebook.com:443" instead of "https://graph.facebook.com:443".

Because ProxySelector have mapping for "https://" and not for "socket://", it does not find it and it ends with "Connection refused". What is strange is that select() method is called 4 times before execution ends with "Connection refused".

Any help would be appreciated.

回答1:

Apache HTTP Client 3.1 will not natively honor HTTP Proxies returned from the default ProxySelector or user implementations.

Quick Summary of ProxySelector

ProxySelector is a service class which selects and returns a suitable Proxy for a given URL based on its scheme. For example, a request for http://somehost will try to provide an HTTP proxy if one is defined. The default ProxySelector can be configured at runtime using System Properties, such as http.proxyHost and http.proxyPort.

HTTPUrlConnection

An instance of HTTPUrlConnection will check against the default ProxySelector multiple times: 1st to select for http or https, then later when it builds the raw tcp socket, using the socket scheme. A SOCKS proxy could be used to proxy a raw tcp socket but are not often found in corporate environments, so a raw tcp socket will usually receive no proxy.

HTTP Client 3.1

HC 3.1, on the other hand, will never check the default ProxySelector for the http/https schemes. It will check, however, at a later points for the socket scheme when it eventually builds the raw socket - This is the request you are seeing. This means the System Properties http.proxyHost and http.proxyPort are ineffective. This is obviously not ideal for most people who only have an HTTP/HTTPS proxy.

To work around this, you have two options: define a proxy on each HC 3.1 connection or implement your own HC 3.1 HTTPConnectionManager.

HTTPConnectionManager

The HTTPConnectionManager is responsible for building connections for the HC 3.1 client.

The default HC 3.1 HTTPConnectionManager can be extended so that it looks for a suitable proxy from a ProxySelector (default or custom) when building the request in the same way HTTPUrlConnection does:

public class MyHTTPConnectionManager extends SimpleHttpConnectionManager {
@Override
public HttpConnection getConnectionWithTimeout(
        HostConfiguration hostConfiguration, long timeout) {
    HttpConnection hc = super.getConnectionWithTimeout(hostConfiguration, timeout);

    try {
        URI uri = new URI( hostConfiguration.getHostURL());
        List<Proxy> hostProxies =  ProxySelector.getDefault().select(uri);
        Proxy Proxy = hostProxies.get(0);

        InetSocketAddress sa = (InetSocketAddress) Proxy.address();
        hc.setProxyHost(sa.getHostName());
        hc.setProxyPort(sa.getPort());

    } catch (URISyntaxException e) {
        return hc;
    }   
    return hc;
}
}

Then, when you create an HC 3.1 client, use your new connection manager:

HttpClient client = new HttpClient(new MyHTTPConnectionManager() );


回答2:

It's not the ProxySelector that changes the scheme, but the SocketFactory opening a Socket. If the SocketFactory is null a SOCKS socket will be created by default which only allows SOCKS proxies. I don't know anything about Sockets and cannot tell you if there's a way to make it work with HTTP proxies.

But using another approach may help, since Apache HttpClient seems to have its own way to configure proxies.

client.getHostConfiguration().setProxy(proxyHost, proxyPort);

if (proxyUser != null) {
    client.getState().setProxyCredentials(new AuthScope(proxyHost, proxyPort), 
        new UsernamePasswordCredentials(proxyUser, proxyPassword));
}