Converting to Base64 in JavaScript without Depreca

My name is Festus.

I need to convert strings to and from Base64 in a browser via JavaScript. The topic is covered quite well on this site and on Mozilla, and the suggested solution seems to be along these lines:

function toBase64(str) {
    return window.btoa(unescape(encodeURIComponent(str)));
}

function fromBase64(str) {
    return decodeURIComponent(escape(window.atob(str)));
}

I did a bit more research and found out that escape() and unescape() are deprecated and should no longer be used. With that in mind, I tried removing calls to the deprecated functions which yields:

function toBase64(str) {
    return window.btoa(encodeURIComponent(str));
}

function fromBase64(str) {
    return decodeURIComponent(window.atob(str));
}

This seems to work but it begs the following questions:

(1) Why did the originally proposed solution include calls to escape() and unescape()? The solution was proposed prior to deprecation but presumably these functions added some kind of value at the time.

(2) Are there certain edge cases where my removal of these deprecated calls will cause my wrapper functions to fail?

NOTE: There are other, far more verbose and complex solutions on StackOverflow to the problem of string=>Base64 conversion. I'm sure they work just fine but my question is specifically related to this particular popular solution.

Thanks,

Festus

标签： javascript encoding base64

1条回答

ゆ、 Hurt°

2楼-- · 2019-05-10 21:09

TL;DR In principle escape()/unescape() are not necessary, and your second version without the deprecated functions is safe, yet it generates longer base64 encoded output:

console.log(decodeURIComponent(atob(btoa(encodeURIComponent("€uro")))))
console.log(decodeURIComponent(escape(atob(btoa(unescape(encodeURIComponent("€uro")))))))

both create the output "€uro" yet the version without escape()/unescape() with a longer base64 representation

btoa(encodeURIComponent("€uro")).length // = 16
btoa(unescape(encodeURIComponent("€uro"))).length // = 8

The escape()/unescape() step can only become necessary if the counterpart (e.g. an unadjustable php-Script expecting the base64 to be done in the specific way.).

Long version:

First, to better understand the differences in between the two versions of toBase64() and fromBase64() that you suggest above, let us have a look to the btoa() which is at the core of the issue. Documentation says, that the naming of btoa is mnemonic so that

"b" can be considered to stand for "binary", and the "a" for "ASCII".

which is somewhat misleading, as the documentation hastens to add, that

in practice, though, for primarily historical reasons, both the input and output of these functions are Unicode strings.

Even less perfect, btoa() is indeed only accepting

characters in the range U+0000 to U+00FF

plainly spoking only only English alpha-numeric-text works with btoa().

The purpose of encodeURIComponent(), which you have in both of your versions, is to help out with strings having character outside the range U+0000 to U+00FF. An example would be the string "uü€" having three characters

a (U+0061)
ä (U+00E4)
€ (U+20AC)

Here only the two first characters are in range. The third character, the Euro sign, is outside and window.btoa("€") raises an out of range error. To avoid such an error a solution is needed to represent "€" within the set of U+0000 to U+00FF. This is what window.encodeURIComponent does:

window.encodeURIComponent("uü€")
creates the following string:
"a%C3%A4%E2%82%AC" in which some characters have been encoded

a = a (stayed the same)
ä = %C3%A4 (changed to its utf8 representation)
€ = %E2%82%AC (changed to its utf8 representation)

The (changed to its utf8 representation) works by using the character "%" and a two digit number for each byte of the character's utf8 representation. The "%" is U+0025 and hence allowed inside the btoa()-range. The result of window.encodeURIComponent("uü€") can then be fed to btoa() as it has no out of range characters anymore:

btoa("a%C3%A4%E2%82%AC") \\ = "YSVDMyVBNCVFMiU4MiVBQw=="

The crux of using an unescape() in between the btoa() and the encodeURIComponent() is that all bytes of the utf8 representation use up 3 characters %xx to store all potential values of a byte 0x00 to 0xFF. Here is where unescape() can play an optional role. This is because unescape() takes all such %xx bytes and creates in its place a single Unicode character in the allowed U+0000 to 0+00FF range.

To check :

btoa(encodeURIComponent("uü€"))).length // = 24
btoa(unescape(encodeURIComponent("uü€"))).length // = 8

the main difference is a length reduction of the base64 representation of the text, at the cost of additional parsing via the optional escape()/unescape(), which in case of mainly ASCII character set text is minimal anyway.

The main lesson to understand is that btoa() is misleadingly named and requires Unicode U+0000 to U+00FF characters which encodeURIComponent() by itself generates. The deprecated escape()/unescape() only has a space saving feature, which is maybe desirable but not necessary. The problem of Unicode symbols > U+00FF is addressed here as the btoa/atob Unicode problem, which mentions even ways to improve "all UTF8 Unicode" to base64 encoding possible in modern browsers.

0人赞添加讨论(0) 举报

Converting to Base64 in JavaScript without Depreca

采纳回答

编辑标签

举报内容

检举类型

检举原因

检举说明(必填)

打开微信“扫一扫”，打开网页后点击屏幕右上角分享按钮

付费偷看金额在0.1-10元之间