Java - how to encode URL path for non Latin charac

2019-02-19 14:15发布

Currently there is final URL url = new URL(urlString); but I run into server not supporting non-ASCII in path.

Using Java (Android) I need to encode URL from

http://acmeserver.com/download/agc/fcms/儿子去哪儿/儿子去哪儿.png

to

http://acmeserver.com/download/agc/fcms/%E5%84%BF%E5%AD%90%E5%8E%BB%E5%93%AA%E5%84%BF/%E5%84%BF%E5%AD%90%E5%8E%BB%E5%93%AA%E5%84%BF.png

just like browsers do.

I checked URLEncoder.encode(s, "UTF-8"); but it also encodes / slashes

http%3A%2F%2acmeserver.com%2Fdownload%2Fagc%2Ffcms%2F%E5%84%BF%E5%AD%90%E5%8E%BB%E5%93%AA%E5%84%BF%2F%E5%84%BF%E5%AD%90%E5%8E%BB%E5%93%AA%E5%84%BF.png

Is there way to do it simply without parsing string that the method gets?

from http://www.w3.org/TR/html40/appendix/notes.html#non-ascii-chars

B.2.1 Non-ASCII characters in URI attribute values Although URIs do not contain non-ASCII values (see [URI], section 2.1) authors sometimes specify them in attribute values expecting URIs (i.e., defined with %URI; in the DTD). For instance, the following href value is illegal:

<A href="http://foo.org/Håkon">...</A>

We recommend that user agents adopt the following convention for handling non-ASCII characters in such cases:

  1. Represent each character in UTF-8 (see [RFC2279]) as one or more bytes.
  2. Escape these bytes with the URI escaping mechanism (i.e., by converting each byte to %HH, where HH is the hexadecimal notation of the byte value).

3条回答
贪生不怕死
2楼-- · 2019-02-19 14:54

I did it as below, which is cumbersome

        //was: final URL url = new URL(urlString);
        String asciiString;
        try {
            asciiString = new URL(urlString).toURI().toASCIIString();
        } catch (URISyntaxException e1) {
            Log.e(TAG, "Error new URL(urlString).toURI().toASCIIString() " + urlString + " : " + e1);
            return null;
        }
        Log.v(TAG, urlString+" -> "+ asciiString );
        final URL url = new URL(asciiString);

url is later used in

        connection = (HttpURLConnection) url.openConnection();
查看更多
Viruses.
3楼-- · 2019-02-19 15:00

You should just encode the special characters and the parse them together. If you tried to encode the entire URI then you'd run into problems.

Stick with:

String query = URLEncoder.encode("apples oranges", "utf-8");
String url = "http://stackoverflow.com/search?q=" + query;

Check out this great guide on URL encoding.

That being said, a little bit of searching suggests that there may be other ways to do what you want:

Give this a try:

String urlStr = "http://abc.dev.domain.com/0007AC/ads/800x480 15sec h.264.mp4";
URL url = new URL(urlStr);
URI uri = new URI(url.getProtocol(), url.getUserInfo(), url.getHost(), url.getPort(), url.getPath(), url.getQuery(), url.getRef());
url = uri.toURL();

(You will need to have those spaces encoded so you can use it for a request.)

This takes advantage of a couple features available to you in Android classes. First, the URL class can break a url into its proper components so there is no need for you to do any string search/replace work. Secondly, this approach takes advantage of the URI class feature of properly escaping components when you construct a URI via components rather than from a single string.

The beauty of this approach is that you can take any valid url string and have it work without needing any special knowledge of it yourself.

查看更多
劫难
4楼-- · 2019-02-19 15:05
final URL url = new URL( new URI(urlString).toASCIIString() );

worked for me.

查看更多
登录 后发表回答