[removed] What characters are not encoded by encod

2019-05-02 07:41发布

问题:

I'm writing my own function in a different language, and I want it to provide identical results if possible.

回答1:

You can find information in the MDC documentation:

encodeURIComponent escapes all characters except the following:
alphabetic, decimal digits, - _ . ! ~ * ' ( )



回答2:

Short answer, you can match all UTF-16 code units encodeURIComponent would encode using the below:

/[^a-zA-Z0-9\-_.!~*'()]/g

though, the spec says that it handles supplemental code points with 4 byte UTF-8 encodings.

Long answer, ES 262 says

15.1.3.4 encodeURIComponent (uriComponent)

The encodeURIComponent function computes a new version of a URI in which each instance of certain characters is replaced by one, two, three, or four escape sequences representing the UTF-8 encoding of the character. When the encodeURIComponent function is called with one argument uriComponent, the following steps are taken:

  1. Let componentString be ToString(uriComponent).

  2. Let unescapedURIComponentSet be a String containing one instance of each character valid in uriUnescaped.

  3. Return the result of calling Encode(componentString, unescapedURIComponentSet)

And uriUnescaped is defined thus

uriUnescaped ::: uriAlpha | DecimalDigit | uriMark

where

uriAlpha ::: one of a b c d e f g h i j k l m n o p q r s t u v w x y z A B C D E F G H I J K L M N O P Q R S T U V W X Y Z

uriMark ::: one of - _ . ! ~ * ' ( )

DecimalDigit ::: one of 0 1 2 3 4 5 6 7 8 9