String encoding when constructing a Blob

2019-07-17 15:13发布

问题:

I know that JavaScript strings are usually encoded with an encoding taking at least two bytes per character (UTF-16 or UCS-2).

However, when constructing a Blob, a different encoding appears to be used because when I read it as ArrayBuffer, the length of the returned buffer is 3 for an Euro sign.

var b = new Blob(['€']);

回答1:

According to the W3C, it is UTF-8 encoded.

Demo:

// Create a Blob with an Euro-char (U+20AC)
var b = new Blob(['€']);
var fr = new FileReader();

fr.onload = function() {
  ua = new Uint8Array(fr.result);
  // This will log "3|226|130|172"
  //                  E2  82  AC
  // In UTF-16, it would be only 2 bytes long
  console.log(
    fr.result.byteLength + '|' + 
    ua[0]  + '|' + 
    ua[1] + '|' + 
    ua[2] + ''
  );
};
fr.readAsArrayBuffer(b);

Play with that on JSFiddle.