Java URL encoding: URLEncoder vs. URI

2019-03-08 08:36发布

Looking on the W3 Schools URL encoding webpage, it says that @ should be encoded as %40, and that space should be encoded as %20.

I've tried both URLEncoder and URI, but neither does the above properly:

import java.net.URI;
import java.net.URLEncoder;

public class Test {
    public static void main(String[] args) throws Exception {

        // Prints me%40home.com (CORRECT)
        System.out.println(URLEncoder.encode("me@home.com", "UTF-8"));

        // Prints Email+Address (WRONG: Should be Email%20Address)
        System.out.println(URLEncoder.encode("Email Address", "UTF-8"));

        // http://www.home.com/test?Email%20Address=me@home.com
        // (WRONG: it has not encoded the @ in the email address)
        URI uri = new URI("http", "www.home.com", "/test", "Email Address=me@home.com", null);
        System.out.println(uri.toString());
    }
}

For some reason, URLEncoder does the email address correctly but not spaces, and URI does spaces currency but not email addresses.

How should I encode these 2 parameters to be consistent with what w3schools says is correct (or is w3schools wrong?)

2条回答
等我变得足够好
2楼-- · 2019-03-08 09:01

Although I think the answer from @fge is the right one, as I was using a 3rd party webservice that relied on the encoding outlined in the W3Schools article, I followed the answer from Java equivalent to JavaScript's encodeURIComponent that produces identical output?

public static String encodeURIComponent(String s) {
    String result;

    try {
        result = URLEncoder.encode(s, "UTF-8")
                .replaceAll("\\+", "%20")
                .replaceAll("\\%21", "!")
                .replaceAll("\\%27", "'")
                .replaceAll("\\%28", "(")
                .replaceAll("\\%29", ")")
                .replaceAll("\\%7E", "~");
    } catch (UnsupportedEncodingException e) {
        result = s;
    }

    return result;
}
查看更多
对你真心纯属浪费
3楼-- · 2019-03-08 09:03

URI syntax is defined by RFC 3986 (permissible content for a query string are defined in section 3.4). Java's URI complies to this RFC, with a few caveats mentioned in its Javadoc.

You will notice that the pchar grammar rule is defined by:

pchar = unreserved / pct-encoded / sub-delims / ":" / "@"

Which means a @ is legal in a query string.

Trust URI. It will do the correct, "legal" stuff.

Finally, if you have a look at the Javadoc of URLEncoder, you see that it states:

This class contains static methods for converting a String to the application/x-www-form-urlencoded MIME format.

Which is not the same thing as a query string as defined by the URI specification.

查看更多
登录 后发表回答