How to encode the filename parameter of Content-Di

2018-12-31 01:16发布

Web applications that want to force a resource to be downloaded rather than directly rendered in a Web browser issue a Content-Disposition header in the HTTP response of the form:

Content-Disposition: attachment; filename=FILENAME

The filename parameter can be used to suggest a name for the file into which the resource is downloaded by the browser. RFC 2183 (Content-Disposition), however, states in section 2.3 (The Filename Parameter) that the file name can only use US-ASCII characters:

Current [RFC 2045] grammar restricts parameter values (and hence Content-Disposition filenames) to US-ASCII. We recognize the great desirability of allowing arbitrary character sets in filenames, but it is beyond the scope of this document to define the necessary mechanisms.

There is empirical evidence, nevertheless, that most popular Web browsers today seem to permit non-US-ASCII characters yet (for the lack of a standard) disagree on the encoding scheme and character set specification of the file name. Question is then, what are the various schemes and encodings employed by the popular browsers if the file name “naïvefile” (without quotes and where the third letter is U+00EF) needed to be encoded into the Content-Disposition header?

For the purpose of this question, popular browsers being:

  • Firefox
  • Internet Explorer
  • Safari
  • Google Chrome
  • Opera

17条回答
旧人旧事旧时光
2楼-- · 2018-12-31 01:51

Put you file name in double quotes. Solved the problem for me. Like this:

Content-Disposition: attachment; filename="My Report.doc"

http://kb.mozillazine.org/Filenames_with_spaces_are_truncated_upon_download

查看更多
妖精总统
3楼-- · 2018-12-31 01:51

We had a similar problem in a web application, and ended up by reading the filename from the HTML <input type="file">, and setting that in the url-encoded form in a new HTML <input type="hidden">. Of course we had to remove the path like "C:\fakepath\" that is returned by some browsers.

Of course this does not directly answer OPs question, but may be a solution for others.

查看更多
若你有天会懂
4楼-- · 2018-12-31 01:52

RFC 6266 describes the “Use of the Content-Disposition Header Field in the Hypertext Transfer Protocol (HTTP)”. Quoting from that:

6. Internationalization Considerations

The “filename*” parameter (Section 4.3), using the encoding defined in [RFC5987], allows the server to transmit characters outside the ISO-8859-1 character set, and also to optionally specify the language in use.

And in their examples section:

This example is the same as the one above, but adding the "filename" parameter for compatibility with user agents not implementing RFC 5987:

Content-Disposition: attachment;
                     filename="EURO rates";
                     filename*=utf-8''%e2%82%ac%20rates

Note: Those user agents that do not support the RFC 5987 encoding ignore “filename*” when it occurs after “filename”.

In Appendix D there is also a long list of suggestions to increase interoperability. It also points at a site which compares implementations. Current all-pass tests suitable for common file names include:

  • attwithisofnplain: plain ISO-8859-1 file name with double quotes and without encoding. This requires a file name which is all ISO-8859-1 and does not contain percent signs, at least not in front of hex digits.
  • attfnboth: two parameters in the order described above. Should work for most file names on most browsers, although IE8 will use the “filename” parameter.

That RFC 5987 in turn references RFC 2231, which describes the actual format. 2231 is primarily for mail, and 5987 tells us what parts may be used for HTTP headers as well. Don't confuse this with MIME headers used inside a multipart/form-data HTTP body, which is governed by RFC 2388 (section 4.4 in particular) and the HTML 5 draft.

查看更多
与君花间醉酒
5楼-- · 2018-12-31 01:53

There is discussion of this, including links to browser testing and backwards compatibility, in the proposed RFC 5987, "Character Set and Language Encoding for Hypertext Transfer Protocol (HTTP) Header Field Parameters."

RFC 2183 indicates that such headers should be encoded according to RFC 2184, which was obsoleted by RFC 2231, covered by the draft RFC above.

查看更多
冷夜・残月
6楼-- · 2018-12-31 01:57

If you are using a nodejs backend you can use the following code I found here

var fileName = 'my file(2).txt';
var header = "Content-Disposition: attachment; filename*=UTF-8''" 
             + encodeRFC5987ValueChars(fileName);

function encodeRFC5987ValueChars (str) {
    return encodeURIComponent(str).
        // Note that although RFC3986 reserves "!", RFC5987 does not,
        // so we do not need to escape it
        replace(/['()]/g, escape). // i.e., %27 %28 %29
        replace(/\*/g, '%2A').
            // The following are not required for percent-encoding per RFC5987, 
            // so we can allow for a little better readability over the wire: |`^
            replace(/%(?:7C|60|5E)/g, unescape);
}
查看更多
呛了眼睛熬了心
7楼-- · 2018-12-31 01:59

In PHP this did it for me (assuming the filename is UTF8 encoded):

header('Content-Disposition: attachment;'
    . 'filename="' . addslashes(utf8_decode($filename)) . '";'
    . 'filename*=utf-8\'\'' . rawurlencode($filename));

Tested against IE8-11, Firefox and Chrome.
If the browser can interpret filename*=utf-8 it will use the UTF8 version of the filename, else it will use the decoded filename. If your filename contains characters that can't be represented in ISO-8859-1 you might want to consider using iconv instead.

查看更多
登录 后发表回答