URI - getHost returns null. Why?

2020-07-02 11:44发布

Why is the 1st one returning null, while the 2nd one is returning mail.yahoo.com?

Isn't this weird? If not, what's the logic behind this behavior?

Is the underscore the culprit? Why?

public static void main(String[] args) throws Exception {
    java.net.URI uri = new java.net.URI("http://broken_arrow.huntingtonhelps.com");
    String host = uri.getHost();
    System.out.println("Host = [" + host + "].");

    uri = new java.net.URI("http://mail.yahoo.com");
    host = uri.getHost();
    System.out.println("Host = [" + host + "].");
}

标签: java
5条回答
虎瘦雄心在
2楼-- · 2020-07-02 12:23

Consider using: new java.net.URL("http://broken_arrow.huntingtonhelps.com").getHost() instead. It has alternative parsing implementation. If you have an URI myUri instance, then call myUri.toURL().getHost().

I faced this URI issue in OpenJDK 1.8 and it worked fine with URL.

查看更多
▲ chillily
3楼-- · 2020-07-02 12:28

As mentioned in comments by @hsz it is known bug.

But, let's debug and look inside sources of URI class. The problem is inside the method:

private int parseHostname(int start, int n):

parsing first URI fails at lines if ((p < n) && !at(p, n, ':')) fail("Illegal character in hostname", p);

this is because _ symbol isn't foreseed inside scan block, it allows only alphas, digits and -symbol (L_ALPHANUM, H_ALPHANUM, L_DASH and H_DASH).

And yes, this is not fixed yet in Java 7.

查看更多
一夜七次
4楼-- · 2020-07-02 12:31

It's because of underscore in base uri. Just Remove underscore to check that out.It's working.

Like given below :

public static void main(String[] args) throws Exception {
java.net.URI uri = new java.net.URI("http://brokenarrow.huntingtonhelps.com");
String host = uri.getHost();
System.out.println("Host = [" + host + "].");

uri = new java.net.URI("http://mail.yahoo.com");
host = uri.getHost();
System.out.println("Host = [" + host + "].");

}

查看更多
戒情不戒烟
5楼-- · 2020-07-02 12:32

As mentioned, it is a known JVM bug. Although, if you want to do an HTTP request to such a host, you still can try to use a workaround. The main idea is to construct request basing on the IP, not on the 'wrong' hostname. But in that case you also need to add "Host" header to the request, with the correct (original) hostname.

1: Cut hostname from the URL (it's a rough example, you can use some more smart way):

int n = url.indexOf("://");  
if (n > 0) { n += 3; } else { n = 0; }  
int m = url.indexOf(":", n);
int k = url.indexOf("/", n);  
if (-1 == m) { m = k; }  
String hostHeader;  
if (k > -1) {  
  hostHeader = url.substring(n, k);  
} else {  
  hostHeader = url.substring(n);  
}
String hostname;  
if (m > -1) {  
  hostname = url.substring(n, m);  
} else {  
  hostname = url.substring(n);  
}  

2: Get hostname's IP:

String IP = InetAddress.getByName(hostname).getHostAddress();

3: Construct new URL basing on the IP:

String newURL = url.substring(0, n) + IP + url.substring(m);

4: Now use an HTTP library for preparing request on the new URL (pseudocode):

HttpRequest req = ApacheHTTP.get(newUrl);

5: And now you should add "Host" header with the correct (original) hostname:

req.addHeader("Host", hostHeader);

6: Now you can do the request (pseudocode):

String resp = req.getResponse().asString();
查看更多
何必那么认真
6楼-- · 2020-07-02 12:36

I don't think it's a bug in Java, I think Java is parsing hostnames correctly according to the spec, there are good explanations of the spec here: http://en.wikipedia.org/wiki/Hostname#Restrictions_on_valid_host_names and here: http://www.netregister.biz/faqit.htm#1

Specifically hostnames MUST NOT contain underscores.

查看更多
登录 后发表回答