I've been playing around with JS and can't figure out how JS decides which elements to add to the created array when using Array.from()
. For example, the following emoji
相关问题
- Is there a limit to how many levels you can nest i
- How to toggle on Order in ReactJS
- void before promise syntax
- Keeping track of variable instances
- how to split a list into a given number of sub-lis
Array.from
first tries to invoke the iterator of the argument if it has one, and strings do have iterators, so it invokesString.prototype[Symbol.iterator]
, so let's look up how the prototype method works. It's described in the specification here:Looking up
CreateStringIterator
eventually takes you to21.1.5.2.1 %StringIteratorPrototype%.next ( )
, which does:The
CodeUnitCount
is what you're interested in. This number comes from CodePointAt :So, when iterating over a string with
Array.from
, it returns a CodeUnitCount of 2 only when the character in question is the start of a surrogate pair. Characters that are interpreted as surrogate pairs are described here:षि
is not a surrogate pair:But
It's all about the code behind the characters. Some are coded in two bytes (UTF-16) and are interpreted by
Array.from
as two characters. Gotta check the list of the characters :http://www.fileformat.info/info/charset/UTF-8/list.htm
http://www.fileformat.info/info/charset/UTF-16/list.htm
UTF-16 (the encoding used for strings in js) uses 16bit units. So every unicode that can be represented using 15 bit is represented as one code point, everything else as two, known as surrogate pairs. The iterator of strings iterates over code points.
UTF-16 on Wikipedia