XMLHttpRequest returns wrongly encoded characters

2019-06-25 08:58发布

I use XMLHttpRequest to read the PDF document http://www.virtualmechanics.com/support/tutorials-spinner/Simple2.pdf

%PDF-1.3
%âãÏÓ
[...]

and print its content out to console:

var xhr = new XMLHttpRequest();
xhr.onreadystatechange = function() {
    if (xhr.readyState === 4 && xhr.status === 200) {
      console.log(xhr.responseText);
      console.log('âãÏÓ');
    }
};
xhr.open('GET', 'http://www.virtualmechanics.com/support/tutorials-spinner/Simple2.pdf', true);
xhr.send();

However, the console says

%PDF-1.3
%����
[...]
âãÏÓ

(The last line is from the reference console.log above to verify that the console can actually display those characters.) Apparently, the characters are wrongly encoded at some point. What's going wrong and how to fix this?

2条回答
Ridiculous、
2楼-- · 2019-06-25 09:38

XMLHttpRequest's default response type is text, but here one is actually dealing with binary data. Eric Bidelman describes how to work with it.

The solution to the problem is to read the data as a Blob, then to extract the data from the blob and plug it into hash.update(..., 'binary'):

var xhr = new XMLHttpRequest();
xhr.open('GET', details.url, true);
xhr.responseType = 'blob';
xhr.onload = function() {
  if (this.status === 200) {
    var a = new FileReader();
    a.readAsBinaryString(this.response);
    a.onloadend = function() {
      var hash = crypto.createHash('sha1');
      hash.update(a.result, 'binary');
      console.log(hash.digest('hex'));
    };
  }
};
xhr.send(null);
查看更多
爷、活的狠高调
3楼-- · 2019-06-25 09:42

The MIME type of your file might not be UTF-8. Try overriding it as suggested here and depicted below:

xhr.open('GET', 'http://www.virtualmechanics.com/support/tutorials-spinner/Simple2.pdf', true);
xhr.overrideMimeType('text/xml; charset=iso-8859-1');
xhr.send();
查看更多
登录 后发表回答