I've got URL object with path containing unwise characters (RFC 2396) in my case it is "|" (pipe) character.
Now I need to safely convert that to URI, but URL.toURI()
throws an exception.
I've read URL documentation but this part is for me confusing:
The URL class does not itself encode or decode any URL components
according to the escaping mechanism defined in RFC2396. It is the
responsibility of the caller to encode any fields, which need to be
escaped prior to calling URL, and also to decode any escaped fields,
that are returned from URL. Furthermore, because URL has no knowledge
of URL escaping, it does not recognize equivalence between the encoded
or decoded form of the same URL.
So how should I do it? What is the pattern here to encode this characters during conversion? Do I need create encoded copy of my URL object?
OK, I come up with something like this:
URI uri = new URI(url.getProtocol(),
null /*userInfo*/,
url.getHost(),
url.getPort(),
(url.getPath()==null)?null:URLDecoder.decode(url.getPath(), "UTF-8"),
(url.getQuery()==null)?null:URLDecoder.decode(url.getQuery(), "UTF-8"),
null /*fragment*/);
Looks like it works, here is an example. Can some one confirm that this is proper solution?
Edit: initial solution had some problems when there was a query so I've fixed it.
Use URL encoding?
From your example, you currently have:
URL url = new URL("http", "google.com", 8080, "/crapy|path with-unwise_characters.jpg");
Instead, I would use:
String path = "/crapy|path with-unwise_characters.jpg"
URL url = new URL("http", "google.com", 8080, URLEncoder.encode(path, "UTF-8"));
This should work and handle all unwise characters in the path as per the standard URL encoding.
HTTPClient 4 has an object for that org.apache.http.client.utils.URIBuilder:
URIBuilder builder =
new URIBuilder()
.setScheme(url.getProtocol())
.setHost(url.getHost())
.setPort(url.getPort())
.setUserInfo(url.getUserInfo())
.setPath(url.getPath())
.setQuery(url.getQuery());
URI uri = builder.build();
return uri;