Can someone please confirm that all Kanji characters in Chinese are 3 bytes long in UTF-8?
相关问题
- UrlEncodeUnicode and browser navigation errors
- WebElement.getText() function and utf8
- How to convert a string to a byte array which is c
- Character Encoding in iframes
- Unicode issue with makemessages --all Django 1.6.2
相关文章
- iconv() Vs. utf8_encode()
- Why is `'↊'.isnumeric()` false?
- How to display unicode in SVG?
- When sending XML to JMS should I use TextMessage o
- Spanish Characters in HTML Page Title
- Google app engine datastore string encoding proble
- UnicodeEncodeError when saving ImageField containi
- How can i get know that my String contains diacrit
Yes, Kanji is U+4e00 to U+9faf, UTF8 3 bytes are U+0800 to U+FFFF.
The commonly used Hanzi/Kanji characters are in the "CJK Unified Ideographs" block between U+4E00 and U+9FFF, and take 3 bytes in UTF-8. (The Japanese Hiragana and Katakana characters also take 3 bytes.)
However, there are also some very rarely-used characters in the "CJK Unified Ideographs Extension B" and "CJK Compatibility Ideographs Supplement" blocks, which take 4 bytes in UTF-8.
Also be aware that Chinese text often contains ASCII characters like the digits 0-9.