If a unicode uses 17 bits codepoints, how the surrogate pairs is calculated from code points?
相关问题
- UrlEncodeUnicode and browser navigation errors
- WebElement.getText() function and utf8
- Unicode issue with makemessages --all Django 1.6.2
- How to check if a string contain only UTF-8 charac
- Emoji are not being encoded correctly for output w
相关文章
- Why is `'↊'.isnumeric()` false?
- How to display unicode in SVG?
- Spanish Characters in HTML Page Title
- UnicodeEncodeError when saving ImageField containi
- Base64 Encoding: Illegal base64 character 3c
- Why is TextView showing the unicode right arrow (\
- C++ (Standard) Exceptions and Unicode
- How to read the Content Type header and convert in
If it is code you are after, here is how a single codepoint is encoded in UTF-16 and UTF-8 respectively.
A single codepoint to UTF-16 codeunits:
A single codepoint to UTF-8 codeunits:
Unicode code points are scalar values which range from 0x000000 to 0x10FFFF. Thus they are are 21 bit integers, not 17 bit.
Surrogate pairs are a mechanism of the UTF-16 form. This represents the 21-bit scalar values as one or two 16-bit code units.
This is explained in detail, with sample code, in the Unicode consortium's FAQ, UTF-8, UTF-16, UTF-32 & BOM. That FAQ refers to the section of the Unicode Standard which has even more detail.