How to convert URL toURI when there are unwise cha

2019-04-28 12:59发布

I've got URL object with path containing unwise characters (RFC 2396) in my case it is "|" (pipe) character. Now I need to safely convert that to URI, but URL.toURI() throws an exception.

I've read URL documentation but this part is for me confusing:

The URL class does not itself encode or decode any URL components according to the escaping mechanism defined in RFC2396. It is the responsibility of the caller to encode any fields, which need to be escaped prior to calling URL, and also to decode any escaped fields, that are returned from URL. Furthermore, because URL has no knowledge of URL escaping, it does not recognize equivalence between the encoded or decoded form of the same URL.

So how should I do it? What is the pattern here to encode this characters during conversion? Do I need create encoded copy of my URL object?

标签: java http url uri
3条回答
在下西门庆
2楼-- · 2019-04-28 13:14

OK, I come up with something like this:

URI uri = new URI(url.getProtocol(), 
                  null /*userInfo*/,
                  url.getHost(), 
                  url.getPort(), 
                  (url.getPath()==null)?null:URLDecoder.decode(url.getPath(), "UTF-8"),
                  (url.getQuery()==null)?null:URLDecoder.decode(url.getQuery(), "UTF-8"),
                  null /*fragment*/);

Looks like it works, here is an example. Can some one confirm that this is proper solution?

Edit: initial solution had some problems when there was a query so I've fixed it.

查看更多
贼婆χ
3楼-- · 2019-04-28 13:15

HTTPClient 4 has an object for that org.apache.http.client.utils.URIBuilder:

           URIBuilder builder = 
            new URIBuilder()
        .setScheme(url.getProtocol())
        .setHost(url.getHost())
        .setPort(url.getPort())
        .setUserInfo(url.getUserInfo())
        .setPath(url.getPath())
        .setQuery(url.getQuery());
    URI uri = builder.build();
    return uri;
查看更多
我想做一个坏孩纸
4楼-- · 2019-04-28 13:28

Use URL encoding?

From your example, you currently have:

URL url = new URL("http", "google.com", 8080, "/crapy|path with-unwise_characters.jpg");

Instead, I would use:

String path = "/crapy|path with-unwise_characters.jpg"
URL url = new URL("http", "google.com", 8080, URLEncoder.encode(path, "UTF-8"));

This should work and handle all unwise characters in the path as per the standard URL encoding.

查看更多
登录 后发表回答