How to encode the filename parameter of Content-Di

2018-12-31 01:16发布

Web applications that want to force a resource to be downloaded rather than directly rendered in a Web browser issue a Content-Disposition header in the HTTP response of the form:

Content-Disposition: attachment; filename=FILENAME

The filename parameter can be used to suggest a name for the file into which the resource is downloaded by the browser. RFC 2183 (Content-Disposition), however, states in section 2.3 (The Filename Parameter) that the file name can only use US-ASCII characters:

Current [RFC 2045] grammar restricts parameter values (and hence Content-Disposition filenames) to US-ASCII. We recognize the great desirability of allowing arbitrary character sets in filenames, but it is beyond the scope of this document to define the necessary mechanisms.

There is empirical evidence, nevertheless, that most popular Web browsers today seem to permit non-US-ASCII characters yet (for the lack of a standard) disagree on the encoding scheme and character set specification of the file name. Question is then, what are the various schemes and encodings employed by the popular browsers if the file name “naïvefile” (without quotes and where the third letter is U+00EF) needed to be encoded into the Content-Disposition header?

For the purpose of this question, popular browsers being:

  • Firefox
  • Internet Explorer
  • Safari
  • Google Chrome
  • Opera

17条回答
何处买醉
2楼-- · 2018-12-31 01:59

I found out solution, that works for all my browsers (ie. all browsers I have installed - IE8, FF16, Opera 12, Chrome 22).

My solution is described in other thread: Java servlet download filename special characters

My solution is based on the fact, how browsers trying to read value from filename parameter. If there is no charset specified in the filename parameter (for example filename*=utf-8''test.xml) browsers expect that value is encoded in browser's native encoding.

Different browsers expect diffrent native encoding. Usually browser's native encoding is utf-8 (FireFox, Opera, Chrome). But IE's native encoding is Win-1250. (I don't know anything about other browsers.)

Hence, if we put value into filename parametr, that is encoded by utf-8/win-1250 according to user's browser, it should work. At least, it works for me.

In short, if we have file named omáčka.xml,
for FireFox, Opera and Chrome I response this header (encoded in utf-8):

Content-Disposition: attachment; filename="omáčka.xml"

and for IE I response this header (encoded in win-1250):

Content-Disposition: attachment; filename="omáèka.jpg"

Java example is in my post that is mentioned above.

查看更多
还给你的自由
3楼-- · 2018-12-31 02:03

I use the following code snippets for encoding (assuming fileName contains the filename and extension of the file, i.e.: test.txt):


PHP:

if ( strpos ( $_SERVER [ 'HTTP_USER_AGENT' ], "MSIE" ) > 0 )
{
     header ( 'Content-Disposition: attachment; filename="' . rawurlencode ( $fileName ) . '"' );
}
else
{
     header( 'Content-Disposition: attachment; filename*=UTF-8\'\'' . rawurlencode ( $fileName ) );
}

Java:

fileName = request.getHeader ( "user-agent" ).contains ( "MSIE" ) ? URLEncoder.encode ( fileName, "utf-8") : MimeUtility.encodeWord ( fileName );
response.setHeader ( "Content-disposition", "attachment; filename=\"" + fileName + "\"");
查看更多
深知你不懂我心
4楼-- · 2018-12-31 02:03

I ended up with the following code in my "download.php" script (based on this blogpost and these test cases).

$il1_filename = utf8_decode($filename);
$to_underscore = "\"\\#*;:|<>/?";
$safe_filename = strtr($il1_filename, $to_underscore, str_repeat("_", strlen($to_underscore)));

header("Content-Disposition: attachment; filename=\"$safe_filename\""
.( $safe_filename === $filename ? "" : "; filename*=UTF-8''".rawurlencode($filename) ));

This uses the standard way of filename="..." as long as there are only iso-latin1 and "safe" characters used; if not, it adds the filename*=UTF-8'' url-encoded way. According to this specific test case, it should work from MSIE9 up, and on recent FF, Chrome, Safari; on lower MSIE version, it should offer filename containing the ISO8859-1 version of the filename, with underscores on characters not in this encoding.

Final note: the max. size for each header field is 8190 bytes on apache. UTF-8 can be up to four bytes per character; after rawurlencode, it is x3 = 12 bytes per one character. Pretty inefficient, but it should still be theoretically possible to have more than 600 "smiles" %F0%9F%98%81 in the filename.

查看更多
姐姐魅力值爆表
5楼-- · 2018-12-31 02:03

Classic ASP Solution

Most modern browsers support passing the Filename as UTF-8 now but as was the case with a File Upload solution I use that was based on FreeASPUpload.Net (site no longer exists, link points to archive.org) it wouldn't work as the parsing of the binary relied on reading single byte ASCII encoded strings, which worked fine when you passed UTF-8 encoded data until you get to characters ASCII doesn't support.

However I was able to find a solution to get the code to read and parse the binary as UTF-8.

Public Function BytesToString(bytes)    'UTF-8..
  Dim bslen
  Dim i, k , N 
  Dim b , count 
  Dim str

  bslen = LenB(bytes)
  str=""

  i = 0
  Do While i < bslen
    b = AscB(MidB(bytes,i+1,1))

    If (b And &HFC) = &HFC Then
      count = 6
      N = b And &H1
    ElseIf (b And &HF8) = &HF8 Then
      count = 5
      N = b And &H3
    ElseIf (b And &HF0) = &HF0 Then
      count = 4
      N = b And &H7
    ElseIf (b And &HE0) = &HE0 Then
      count = 3
      N = b And &HF
    ElseIf (b And &HC0) = &HC0 Then
      count = 2
      N = b And &H1F
    Else
      count = 1
      str = str & Chr(b)
    End If

    If i + count - 1 > bslen Then
      str = str&"?"
      Exit Do
    End If

    If count>1 then
      For k = 1 To count - 1
        b = AscB(MidB(bytes,i+k+1,1))
        N = N * &H40 + (b And &H3F)
      Next
      str = str & ChrW(N)
    End If
    i = i + count
  Loop

  BytesToString = str
End Function

Credit goes to Pure ASP File Upload by implementing the BytesToString() function from include_aspuploader.asp in my own code I was able to get UTF-8 filenames working.


Useful Links

查看更多
永恒的永恒
6楼-- · 2018-12-31 02:05

There is a simple and very robust alternative: use a URL that contains the filename you want.

When the name after the last slash is the one you want, you don't need any extra headers!

This trick works:

/real_script.php/fake_filename.doc

And if your server supports URL rewriting (e.g. mod_rewrite in Apache) then you can fully hide the script part.

Characters in URLs should be in UTF-8, urlencoded byte-by-byte:

/mot%C3%B6rhead   # motörhead
查看更多
登录 后发表回答