Why is the 1st one returning null
, while the 2nd one is returning mail.yahoo.com
?
Isn't this weird? If not, what's the logic behind this behavior?
Is the underscore the culprit? Why?
public static void main(String[] args) throws Exception {
java.net.URI uri = new java.net.URI("http://broken_arrow.huntingtonhelps.com");
String host = uri.getHost();
System.out.println("Host = [" + host + "].");
uri = new java.net.URI("http://mail.yahoo.com");
host = uri.getHost();
System.out.println("Host = [" + host + "].");
}
Consider using:
new java.net.URL("http://broken_arrow.huntingtonhelps.com").getHost()
instead. It has alternative parsing implementation. If you have anURI myUri
instance, then callmyUri.toURL().getHost()
.I faced this
URI
issue in OpenJDK 1.8 and it worked fine withURL
.As mentioned in comments by @hsz it is known bug.
But, let's debug and look inside sources of
URI
class. The problem is inside the method:private int parseHostname(int start, int n)
:parsing first URI fails at lines
if ((p < n) && !at(p, n, ':')) fail("Illegal character in hostname", p);
this is because
_
symbol isn't foreseed inside scan block, it allows only alphas, digits and-
symbol (L_ALPHANUM
,H_ALPHANUM
,L_DASH
andH_DASH
).And yes, this is not fixed yet in
Java 7
.It's because of underscore in base uri. Just Remove underscore to check that out.It's working.
Like given below :
}
As mentioned, it is a known JVM bug. Although, if you want to do an HTTP request to such a host, you still can try to use a workaround. The main idea is to construct request basing on the IP, not on the 'wrong' hostname. But in that case you also need to add "Host" header to the request, with the correct (original) hostname.
1: Cut hostname from the URL (it's a rough example, you can use some more smart way):
2: Get hostname's IP:
3: Construct new URL basing on the IP:
4: Now use an HTTP library for preparing request on the new URL (pseudocode):
5: And now you should add "Host" header with the correct (original) hostname:
6: Now you can do the request (pseudocode):
I don't think it's a bug in Java, I think Java is parsing hostnames correctly according to the spec, there are good explanations of the spec here: http://en.wikipedia.org/wiki/Hostname#Restrictions_on_valid_host_names and here: http://www.netregister.biz/faqit.htm#1
Specifically hostnames MUST NOT contain underscores.