Encode and Decode rfc2396 URLs

2020-02-09 04:11发布

问题:

What is the best way to encode URL strings such that they are rfc2396 compliant and to decode a rfc2396 compliant string such that for example %20 is replaced with a space character?

edit: URLEncoder and URLDecoder classes do not encode/decode rfc2396 compliant URLs, they encode to a MIME type of application/x-www-form-urlencoded which is used to encode HTML form parameter data.

回答1:

Use the URI class as follows:

URI uri = new URI("http", "//www.someurl.com/has spaces in url", null);
URL url = uri.toURL();

or if you want a String:

String urlString = uri.toASCIIString();


回答2:

Your component parts, potentially containing characters that must be escaped, should already have been escaped using URLEncoder before being concatenated into a URI.

If you have a URI with out-of-band characters in (like space, "<>[]{}\|^`, and non-ASCII bytes), it's not really a URI. You can try to fix them up by manually %-escaping them, but this is a last-ditch fix-up operation and not a standard form of encoding. This is usually necessary when you are accepting potentially-malformed URIs from user input, but it's not a standardised operation and I don't know of any built-in Java library function that will do it for you; you may have to hack something up yourself with a RegExp.

In the other direction, you must take your URI apart into its component parts (each separate path part, query parameter name and value, and so on) before you can unescape each part (using an URLDecoder). There is no sensible way to %-decode a whole URI in one go; you could try to ‘decode %-escapes that do not decode to delimiters’ (like /?=&;%) but you would be left with a strange inconsistent string that doesn't conform to any URI-processing standard.

URLEncoder/URLDecoder are fine for handling URI query components, both names and values. However they are not quite right for handling URI path part components. The difference is that the ‘+’ character does not mean a space in a path part. You can fix this up with a simple string replace: after URLEncoding, replace ‘+’ with ‘%20’; before URLDecoding, replace ‘+’ with ‘%2B’. You can ignore the difference if you are not planning to include segments containing spaces or pluses in your path.



回答3:

The javadocs recommend using the java.net.URI class to accomplish the encoding. To ensure that the URI class properly encodes the url, one of the multi-argument constructors must be used. These constructors will perform the required encoding, but require you to parse any url string into the parameters.

If you want to decode, you must construct the URI with the single argument constructor, which does not do any encoding. You can then call methods such as getPath() etc. to retrieve and build the decoded URL.



回答4:

Use java.net.URLEncoder and java.net.URLDecoder.



标签: java rfc2396