I am handling utf-8 strings in JavaScript and need to escape them.
Both escape() / unescape() and encodeURI() / decodeURI() work in my browser.
escape()
> var hello = "안녕하세요"
> var hello_escaped = escape(hello)
> hello_escaped
"%uC548%uB155%uD558%uC138%uC694"
> var hello_unescaped = unescape(hello_escaped)
> hello_unescaped
"안녕하세요"
encodeURI()
> var hello = "안녕하세요"
> var hello_encoded = encodeURI(hello)
> hello_encoded
"%EC%95%88%EB%85%95%ED%95%98%EC%84%B8%EC%9A%94"
> var hello_decoded = decodeURI(hello_encoded)
> hello_decoded
"안녕하세요"
However, Mozilla says that escape() is deprecated.
Although encodeURI() and decodeURI() work with the above utf-8 string, the docs (as well as the function names themselves) tell me that these methods are for URIs; I do not see utf-8 strings mentioned anywhere.
Simply put, is it okay to use encodeURI() and decodeURI() for utf-8 strings?
Yes, you should avoid both
escape()
andunescape()
Yes, but depending on the form of your input and the required form of your output you may need some extra work.
From your question I assume you have a JavaScript string and you want to convert encoding to UTF-8 and finally store the string in some escaped form.
First of all it's important to note that JavaScript strings enconding is UCS-2, similar to UTF-16, different from UTF-8.
See: https://mathiasbynens.be/notes/javascript-encoding
encodeURIComponent()
is good for the job as turns the UCS-2 JavaScript string into UTF-8 and escapes it in the form a sequence of%nn
substrings where eachnn
is the two hex digits of each byte.However
encodeURIComponent()
does not escape letters, digits and few other characters in the ASCII range. But this is easy to fix.For example, if you want to turn a JavaScript string into an array of numbers representing the bytes of the original string UTF-8 encoded you may use this function:
If you want to turn the string in its hexadecimal representation:
If you change the line in the for loop into
s += '%' + ( u[ i ] < 16 ? '0' : '' ) + u[ i ].toString( 16 );
(adding the
%
sign before each hex digit)The resulting escaped string (UTF-8 encoded) may be turned back into a JavaScript UCS-2 string with
decodeURIComponent()
Hi!
When it comes to
escape
andunescape
, I live by two rules:Avoiding them when you easily can:
As mentioned in the question, both
escape
andunescape
have been deprecated. In general, one should avoid using deprecated functions.So, if
encodeURIComponent
orencodeURI
does the trick for you, you should use that instead ofescape
.Using them when you can't easily avoid them:
Browsers will, as far as possible, strive to achieve backwards compatibility. All major browsers have already implemented
escape
andunescape
; why would they un-implement them?Browsers would have to redefine
escape
andunescape
if the new specification requires them to do so. But wait! The people who write specifications are quite smart. They too, are interested in not breaking backwards compatibility!I realize that the above argument is weak. But trust me, ... when it comes to browsers, deprecated stuff works. This even includes deprecated HTML tags like
<xmp>
and<center>
.Using
escape
andunescape
:So naturally, the next question is, when would one use
escape
orunescape
?Recently, while working on CloudBrave, I had to deal with
utf8
,latin1
and inter-conversions.After reading a bunch of blog posts, I realized how simple this was:
These inter-conversions, without using
escape
andunescape
are rather involved. By not avoidingescape
andunescape
, life becomes simpler.Hope this helps.