I'm downloading a ~50MB file in 5 MB chunks using XMLHttpRequest and the Range header. Things work great, except for detecting when I've downloaded the last chunk.
Here's a screenshot of the request and response for the first chunk. Notice the Content-Length is 1024 * 1024 * 5
(5 MB). Also notice that the server responds correctly with the first 5 MB, and in the Content-Range header, properly specifies the size of the entire file (after the /
):
When I copy the response body into a text editor (Sublime), I only get 5,242,736 characters instead of the expected 5,242,880 as indicated by Content-Length
:
Why are 144 characters missing? This is true of every chunk that gets downloaded, though the exact difference varies a little bit.
However, what's especially strange is the last chunk. The server responds with the last ~2.9 MB of the file (instead of a whole 5 MB) and apparently properly indicates this in the response:
Notice that I am requesting the next 5 MB (even though it goes beyond the total file size). No biggie, the server responds with the last part of the file and the headers indicate the actual byte range returned.
But does it really?
When I call xhr.getResponseHeader("Content-Length")
with Javascript, I see a different story in Chrome:
The XMLHttpRequest object is telling me that another 5 MB was downloaded, beyond the end of the file. Is there something I don't understand about the xhr
object?
What's even weirder is that it works in Firefox 30 as expected:
So between the xhr.responseText.length
not matching the Content-Length
and these headers not agreeing between the xhr
object and the Network tools, I don't know what to do to fix this.
What's causing these discrepancies?
Update: I have confirmed that the server itself is properly sending the request, despite the overshot Range header in the request for the last chunk. This is the output from the raw HTTP request, thanks to good 'ol telnet
:
HTTP/1.1 206 Partial Content
Server: nginx/1.4.5
Date: Mon, 14 Jul 2014 21:50:06 GMT
Content-Type: application/octet-stream
Content-Length: 2987360
Last-Modified: Sun, 13 Jul 2014 22:05:10 GMT
Connection: keep-alive
ETag: "53c30296-2fd9560"
Content-Range: bytes 47185920-50173279/50173280
So it looks like Chrome is malfunctioning. Should this be filed as a bug? Where?
The main issue is that you are reading binary data as text. Note that the server responds with Content-Type: application/octet-stream
which doesn't specify the encoding explicitly - in that case the browser will typically assume that the data is encoded in UTF-8. While the length will mostly be unchanged (bytes with values 0 to 127 are interpreted as a single character in UTF-8 and bytes with higher values will usually be replaced by the replacement character �), your binary file will certainly contain a few valid multi-byte UTF-8 sequences - and these will be combined into one character. That explains why responseText.length
doesn't match the number of bytes received from the server.
Now you could of course force some specific encoding using request.overrideMimeType()
method, ISO 8859-1 would make sense in particular because the first 256 Unicode code points are identical with ISO 8859-1:
request.overrideMimeType("application/octet-stream; charset=iso-8859-1");
That should make sure that one byte will always be interpreted as one character. Still, a better approach would be storing the server response in an ArrayBuffer
which is explicitly meant to deal with binary data.
var request = new XMLHttpRequest();
request.open(...);
request.responseType = "arraybuffer";
request.send();
...
var array = new Uint8Array(request.response);
alert("First byte has value " + array[0]);
alert("Array length is " + array.length);
According to MDN,
responseType = "arraybuffer"
is supported starting with Chrome 10, Firefox 6 and Internet Explorer 10. See also: Typed arrays.
Side-note: Firefox also supports responseType = "moz-chunked-text"
and responseType = "moz-chunked-arraybuffer"
starting with Firefox 9 which allow receiving data in chunks without resorting to ranged requests. It seems that Chrome doesn't plan to implement it, instead they are working on implementing the Streams API.
Edit: I was unable to reproduce your issue with Chrome lying to you about the response headers, at least not without your code. However, the code responsible should be this function in partial_data.cc:
// We are making multiple requests to complete the range requested by the user.
// Just assume that everything is fine and say that we are returning what was
// requested.
void PartialData::FixResponseHeaders(HttpResponseHeaders* headers,
bool success) {
if (truncated_)
return;
if (byte_range_.IsValid() && success) {
headers->UpdateWithNewRange(byte_range_, resource_size_, !sparse_entry_);
return;
}
This code will remove the Content-Length
and Content-Range
headers returned by the server and replace them by ones generated from your request parameters. Given that I cannot reproduce the issue myself, the following is only guesses:
- This code path seems to be used only for requests that can be satisfied from cache, so I guess that things will work correctly if you clear your cache.
resource_size_
variable must have a wrong value in your case, larger than the actual size of the requested file. This variable is determined from the Content-Range
header in the first chunk requested, maybe you have a server response cached there which indicates a larger file.