I have a URI string like the following:
http://www.christlichepartei%F6sterreichs.at/steiermark/
I'm creating a java.lang.URI instance with this string and it succeeds but when I want to retrieve the host it returns null. Opera and Firefox also choke on this URL if I enter it exactly as shown above. But shouldn't the URI class throw a URISyntaxException if it is invalid? How can I detect that the URI is illegal then?
It also behaves the same when I decode the string using URLDecoder which yields
http://www.christlicheparteiösterreichs.at/steiermark/
Now this is accepted by Opera and Firefox but java.net.URI still doesn't like it. How can I deal with such a URL?
thanks
Java 6 has IDN
class to work with internationalized domain names. So, the following produces URI with encoded hostname:
URI u = new URI("http://" + IDN.toASCII("www.christlicheparteiösterreichs.at") + "/steiermark/");
The correct way to encode non-ASCII characters in hostnames is known as "Punycode".
URI throws an URISyntaxException, when you choose the appropriate constructor:
URI someUri=new URI("http","www.christlicheparteiösterreichs.at","/steiermark",null);
java.net.URISyntaxException: Illegal character in hostname at index 28: http://www.christlicheparteiösterreichs.at/steiermark
You can use IDN for this to fix:
URI someUri=new URI("http",IDN.toASCII("www.christlicheparteiösterreichs.at"),"/steiermark",null);
System.out.println(someUri);
System.out.println("host: "+someUri.getHost()));
Output:
http://www.xn--christlicheparteisterreichs-5yc.at/steiermark
host: www.xn--christlicheparteisterreichs-5yc.at
UPDATE regarding the chicken-egg-problem:
You can let URL do the job:
public static URI createSafeURI(final URL someURL) throws URISyntaxException
{
return new URI(someURL.getProtocol(),someURL.getUserInfo(),IDN.toASCII(someURL.getHost()),someURL.getPort(),someURL.getPath(),someURL.getQuery(),someURL.getRef());
}
URI raoul=createSafeURI(new URL("http://www.christlicheparteiösterreichs.at/steiermark/readme.html#important"));
This is just a quick-shot, it is not checked all issues concerning converting an URL to an URI. Use it as a starting point.