URL Escaping Chinese/Japanese Unicode Characters f

2019-02-07 03:11发布

问题:

I'm trying to URL-escape (percent-encode) non-ascii characters in several URLs I'm dealing with. I'm working with a flash application that loads resources like images and sound clips from these URLs. Since the filenames can contain non-ascii characters, like so: 日本語.jpg I escape them by utf-8 encoding the characters, and then percent-escaping the unicode bytes, to get the following:

%E6%97%A5%E6%9C%AC%E8%AA%9E.jpg

These filenames work fine when I run the app in any browser other than Internet Explorer - I've tried Firefox, Safari and Chrome. But when I launch the app in IE (tried both 6 and 8) and it tries to load the sound clip, I get: Error #2044: Unhandled ioError, and the URL has been corrupted to something like:

æ¥æ¬èª.jpg

Any thoughts on how to fix this? This is just test-driving the flash app with local filesystem URLs. I've also noticed that Internet explorer isn't able to locate a file such as: file:///C:/%E6%97%A5%E6%9C%AC%E8%AA%9E.jpg, though Chrome / Firefox will decode it and load just fine for a file with the path

C:\日本語.jpg

edit

I think my problem is the same as the one encountered in the following ActionScript code fragment:

import flash.display.Loader;
import flash.net.URLRequest;
...
var ldr:Loader;
var req:URLRequest = new URLRequest("日本語.jpg");
ldr = new Loader();
ldr.load(req);

Using the string 日本語.jpg will work in IE, while using the string %E6%97%A5%E6%9C%AC%E8%AA%9E.jpg works in other browsers. What I need is a single form that will work in all browsers. I have tried the %u encoding and setting the http request header to Content-Type: text/html; charset=utf-8 with no luck in either percent-escaped or unescaped form.

回答1:

Sorry, no solution, but maybe at least some more information about what might be going on here. (Probably you've already figured this much out, but maybe it will help another reader find a solution.) The "official" url encoding specification seems to leave the door wide open as to how to decode escaped urls like the ones you are generating--are the escaped entities intended to represent UTF-8 characters (as Firefox, etc. are interpretting them) or ASCII characters (as IE is interpretting them)? I don't know of any way to force the intended decoding strategy.

Just a question: what bad thing is happening if you do not escape them at all, but leave the unicode in the url? Although I don't have a lot of experience with it, I thought I remember reading somewhere that the days of needing to escape unicode in urls are behind us. Could be wrong about that...



回答2:

IE uses UTF-8 for HTTP Urls, but I'm not sure about File URLs (even though I tested the behavior as part of the IE team about 10 years ago). If you are using the URLS in HTML, I'd actually recommend trying string literals (if your page encoding is UTF-8) or Numeric Character References (&#dddd;). IE will generally convert the characters into an appropriate encoding, which would be UTF-8 for the HTTP stuff, and UTF-16 for local file system interactions.

It's actually HTTP that needs the URL-escaping, not the HTML parser.



回答3:

Try encoding only the parts of the URI that would cause it to be parsed incorrectly. For instance, encode &, ?, and space. Leave everything else as is, and it should work like a charm.

If you are still running into problems, You may need to set the content-type to utf in your http headers. Something like Content-type: text/html; charset=UTF-8.



回答4:

Why not just use Unicode escape sequences? Paste this into a the body of an HTML web page to see what I mean:

   <script type="text/javascript">
      var fileName = "日本語.jpg";
      document.write(escape(fileName));
   </script>

I get %u65E5%u672C%u8A9E.jpg.



回答5:

From what i've tested, I noticed IE doesn't treat encoded file URLs but it does treat normal http URLs, so that could be the issue. I'm not sure how you are loading them, but you should check out that issue.



回答6:

file:// protocol depends on your OS region settings, if your system settings doesn't set to chinese but english, you can't let IE do this.