I am unable to properly set the user-agent
property for an https connection. From what I've gathered, http-header properties can be set through either the -Dhttp.agent
VM option or through URLConnection.setRequestProperty()
. However, setting the user-agent through the VM option causes " Java/[version]" to be appended to whatever the value of http.agent is. At the same time setRequestProperty()
only works for http connections, not https (at least when I tried it).
java.net.URL url = new java.net.URL( "https://www.google.com" );
java.net.URLConnection conn = url.openConnection();
conn.setRequestProperty("User-Agent","Mozilla/5.0 (Windows NT 5.1; rv:19.0) Gecko/20100101 Firefox/19.0");
conn.connect();
java.io.BufferedReader serverResponse = new java.io.BufferedReader(new java.io.InputStreamReader(conn.getInputStream()));
System.out.println(serverResponse.readLine());
serverResponse.close();
I've found/verified the problem by inspecting http communictions using WireShark. Is there any way around this?
Update: Addition Info
It seems that I didn't look deep enough into the communication. The code is running from behind a proxy so the communication observed is against the proxy, set through -Dhttps.proxyHost
, and not the target website (google.com). Anyway, during an https connection, the method is CONNECT
, not GET
. Here is a wireshark capture of https communication attempt. Like I mentioned above, the user-agent is set through -Dhttp.agent
because URLConnection.setRequestProperty()
has no effect (user-agent = Java/1.7.0). In this case, notice the appended Java/1.7.0. The question remains the same, why is this happening and how do I get around it?
CONNECT www.google.com:443 HTTP/1.1
User-Agent: Mozilla/5.0 (Windows NT 5.1; rv:19.0) Gecko/20100101 Firefox/19.0 Java/1.7.0
Host: www.google.com
Accept: text/html, image/gif, image/jpeg, *; q=.2, */*; q=.2
Proxy-Connection: keep-alive
HTTP/1.1 403 Forbidden
X-Bst-Request-Id: MWPwwh:m7d:39175
X-Bst-Info: ch=req,t=1366218861,h=14g,p=4037_7213:1_156,f=PEFilter,r=PEBlockCatchAllRule,c=1905,v=7.8.14771.200 1363881886
Content-Type: text/html; charset=utf-8
Pragma: No-cache
Content-Language: en
Cache-Control: No-cache
Content-Length: 2491
By the way, the request is forbidden because the proxy filters user-agent, the Java/1.7.0 is causing the rejection. I've appended Java/1.7.0 to the user-agent of an http connection and the proxy refuses connection too. I hope i'm not going crazy :).
I've found/verified the problem by inspecting http communictions using WireShark. Is there any way around this
This is not possible. Communication over an SSL socket is completely obscured from casual observation by the encryption protocol. Using packet capture software you will be able to view the initiation of the SSL connection and the exchange of encrypted packets, but the content of those packets can only be extracted at the other end of the connection (the server). If this were not the case then the HTTPS protocol as a whole would be broken, as the whole point of it is to secure HTTP communications from man-in-the-middle type attacks (where in this case the MITM is the packet sniffer).
Example Capture of an HTTPS request (partial):
.n....E... .........../..5..3..9..2..8..
..............@........................Ql.{...b....OsR..!.4.$.T...-.-.T....Q...M..Ql.{...LM..L...um.M...........s. ...n...p^0}..I..G4.HK.n......8Y...............E...A..>...0...0.........
).s.......0
..*.H..
.....0F1.0...U....US1.0...U.
.
Google Inc1"0 ..U....Google Internet Authority0..
130327132822Z.
131231155850Z0h1.0...U....US1.0...U...
California1.0...U...
Mountain View1.0...U.
.
Google Inc1.0...U....www.google.com0..0
Theoretically, the only way to know if your User-Agent
header is actually being excluded is if you have access to the Google servers, but in actuality there is nothing in either the HTTPS specification or Java's implementation of it that excludes headers that would normally have been sent over HTTP.
Example Capture of HTTP request:
GET / HTTP/1.1
User-Agent: Mozilla/5.0 (Windows NT 5.1; rv:19.0) Gecko/20100101 Firefox/19.0
Host: www.google.com
Accept: text/html, image/gif, image/jpeg, *; q=.2, /; q=.2
Connection: keep-alive
Both example captures were generated with the exact same code:
URL url = new URL(target);
URLConnection conn = url.openConnection();
conn.setRequestProperty("User-Agent",
"Mozilla/5.0 (Windows NT 5.1; rv:19.0) Gecko/20100101 Firefox/19.0");
conn.connect();
BufferedReader serverResponse = new BufferedReader(
new InputStreamReader(conn.getInputStream()));
System.out.println(serverResponse.readLine());
serverResponse.close();
Except that for HTTPS the target was "https://www.google.com", and for HTTP it was "http://www.google.com".
Edit 1:
Based off your updated question, using the -Dhttp.agent
property does indeed append 'Java/version' to the user agent header, as described by the following documentation:
http.agent (default: “Java/<version>”)
Defines the string sent in the User-Agent request header in http requests. Note that the string “Java/<version>” will be appended to the one provided in the property (e.g. if -Dhttp.agent=”foobar” is used, the User-Agent header will contain “foobar Java/1.5.0” if the version of the VM is 1.5.0). This property is checked only once at startup.
The 'offending' code is in a static block initializer of sun.net.www.protocol.http.HttpURLConnection
:
static {
// ...
String agent = java.security.AccessController
.doPrivileged(new sun.security.action.GetPropertyAction(
"http.agent"));
if (agent == null) {
agent = "Java/" + version;
} else {
agent = agent + " Java/" + version;
}
userAgent = agent;
// ...
}
An obscene way around this 'problem' is this snippet of code, which I 1000% recommend you not use:
protected void forceAgentHeader(final String header) throws Exception {
final Class<?> clazz = Class
.forName("sun.net.www.protocol.http.HttpURLConnection");
final Field field = clazz.getField("userAgent");
field.setAccessible(true);
Field modifiersField = Field.class.getDeclaredField("modifiers");
modifiersField.setAccessible(true);
modifiersField.setInt(field, field.getModifiers() & ~Modifier.FINAL);
field.set(null, header);
}
Using this override with https.proxyHost
, https.proxyPort
and http.agent
set gives the desired result:
CONNECT www.google.com:443 HTTP/1.1
User-Agent: Mozilla/5.0 (Windows NT 5.1; rv:19.0) Gecko/20100101 Firefox/19.0
Host: www.google.com
Accept: text/html, image/gif, image/jpeg, *; q=.2, /; q=.2
Proxy-Connection: keep-alive
But yea, don't do that. Its much safer to just use Apache HttpComponents:
final DefaultHttpClient client = new DefaultHttpClient();
HttpHost proxy = new HttpHost("127.0.0.1", 8888, "http");
HttpHost target = new HttpHost("www.google.com", 443, "https");
client.getParams().setParameter(ConnRoutePNames.DEFAULT_PROXY, proxy);
HttpProtocolParams
.setUserAgent(client.getParams(),
"Mozilla/5.0 (Windows NT 5.1; rv:19.0) Gecko/20100101 Firefox/19.0");
final HttpGet get = new HttpGet("/");
HttpResponse response = client.execute(target, get);
I've found/verified the problem by inspecting http communictions using WireShark. Is there any way around this?
There is no problem here. The User-Agent header is set whether the request is transported via HTTP / HTTPS. Even setting it to something unreasonable like blah blah
works on HTTPS. The headers shown below were captured when the underlying protocol used was HTTPS.
Request headers sent via HTTPS
User-Agent: Mozilla/5.0 (Windows NT 5.1; rv:19.0) Gecko/20100101 Firefox/19.0
Accept: text/html, image/gif, image/jpeg, *; q=.2, */*; q=.2
Connection: keep-alive
User-Agent: blah blah
Accept: text/html, image/gif, image/jpeg, *; q=.2, */*; q=.2
Connection: keep-alive
Here's the code that triggers the request.
// localhost:52999 is a reverse proxy to xxx:443
java.net.URL url = new java.net.URL( "https://localhost:52999/" );
java.net.URLConnection conn = url.openConnection();
conn.setRequestProperty("User-Agent","Mozilla/5.0 (Windows NT 5.1; rv:19.0) Gecko/20100101 Firefox/19.0");
conn.connect();
java.io.BufferedReader serverResponse = new java.io.BufferedReader(new java.io.InputStreamReader(conn.getInputStream()));
System.out.println(serverResponse.readLine());
serverResponse.close();
Normally, HTTPS requests cannot be sniffed (like @Perception mentioned). Piping the request through a proxy that replaces the root CA with its own fake CA will allow you to see the traffic. A simpler method is to just look at the access log of the target server. But as you can see from the HTTPS request snippet above, the User-Agent
header that is sent is correct.