How to Convert UTF8 ArrayBuffer to UTF16 JavaScrip

2019-06-07 10:38发布

问题:

The answers from here got me started on how to use the ArrayBuffer:

Converting between strings and ArrayBuffers

However, they have quite a bit of different approaches. The main one is this:

function ab2str(buf) {
  return String.fromCharCode.apply(null, new Uint16Array(buf));
}

function str2ab(str) {
  var buf = new ArrayBuffer(str.length*2); // 2 bytes for each char
  var bufView = new Uint16Array(buf);
  for (var i=0, strLen=str.length; i<strLen; i++) {
    bufView[i] = str.charCodeAt(i);
  }
  return buf;
}

I wanted to clarify though the difference between UTF8 and UTF16 encoding, because I'm not 100% sure this is correct.

So in JavaScript, in my understanding, all strings are UTF16 encoded. But the raw bytes you might have in your own ArrayBuffer can be in any encoding.

So say that I have provided an ArrayBuffer to the browser from an XMLHttpRequest, and those bytes from the backend are in UTF8 encoding:

var r = new XMLHttpRequest()
r.open('GET', '/x', true)
r.responseType = 'arraybuffer'
r.onload = function(){
  var b = r.response
  if (!b) return
  var v = new Uint8Array(b)
}
r.send(null)

So now we have the ArrayBuffer b from the response r in the Uint8Array view v.

The question is, if I want to convert this into a JavaScript string, what to do.

From my understanding, the raw bytes we have in v are encoded in UTF8 (and were sent to the browser encoded in UTF8). If we were to do this though, I don't think it would work right:

function ab2str(buf) {
  return String.fromCharCode.apply(null, new Uint16Array(buf));
}

From my understanding of the fact that we are in UTF8, and JavaScript strings are in UTF16, you need to do this instead:

function ab2str(buf) {
  return String.fromCharCode.apply(null, new Uint8Array(buf));
}

So using Uint8Array instead of Uint16Array. That is the first question, how to go from utf8 bytes -> js string.

The second question is how now to go back to UTF8 bytes from a JavaScript string. That is, I am not sure this would encode right:

function str2ab(str) {
  var buf = new ArrayBuffer(str.length*2); // 2 bytes for each char
  var bufView = new Uint16Array(buf);
  for (var i=0, strLen=str.length; i<strLen; i++) {
    bufView[i] = str.charCodeAt(i);
  }
  return buf;
}

I am not sure what to change in this one though, to get back to a UTF8 ArrayBuffer. Something like this seems incorrect:

function str2ab(str) {
  var buf = new ArrayBuffer(str.length*2); // 2 bytes for each char
  var bufView = new Uint8Array(buf);
  for (var i=0, strLen=str.length; i<strLen; i++) {
    bufView[i] = str.charCodeAt(i);
  }
  return buf;
}

Anyways, I am just trying to clarify how exactly to go from UTF8 bytes, which are encoding a string from the backend, to a UTF16 JavaScript string on the frontend.