U+4E00..U+9FFF is part of the complete set,but not all
相关问题
- UrlEncodeUnicode and browser navigation errors
- Unicode issue with makemessages --all Django 1.6.2
- Python process a csv file to remove unicode charac
- How to match non-ASCII (German, Spanish, etc.) let
- Unicode Warning when using NLTK stopwords with Tfi
相关文章
- Why is `'↊'.isnumeric()` false?
- How to display unicode in SVG?
- UnicodeEncodeError when saving ImageField containi
- Why is TextView showing the unicode right arrow (\
- C++ (Standard) Exceptions and Unicode
- Is it possible to have SQL Server convert collatio
- UTF-16 safe substring in C# .NET
- Strange Java Unicode Regular Expression StringInde
The Unicode code blocks that the others answers gave certainly cover most of the Chinese Unicode characters, but check out some of these other code blocks, too.
See my fuller discussion here. And this site is convenient for browsing Unicode.
The exact ranges for Chinese characters (except the extensions) are
[\u2E80-\u2FD5\u3190-\u319f\u3400-\u4DBF\u4E00-\u9FCC]
.[\u2e80-\u2fd5]
[\u3190-\u319f]
[\u3400-\u4DBF]
[\u4E00-\u9FCC]
For the details please refer to here, and the extensions are provided in other answers.
Unicode currently has 74605 CJK characters. CJK characters not only includes characters used by Chinese, but also Japanese Kanji, Korean Hanja, and Vietnamese Chu Nom. Some CJK characters are not Chinese characters.
1) 20941 characters from the CJK Unified Ideographs block.
Code points U+4E00 to U+9FCC.
2) 6582 characters from the CJKUI Ext A block.
Code points U+3400 to U+4DB5. Unicode 3.0 (1999).
3) 42711 characters from the CJKUI Ext B block.
Code points U+20000 to U+2A6D6. Unicode 3.1 (2001).
3) 4149 characters from the CJKUI Ext C block.
Code points U+2A700 to U+2B734. Unicode 5.2 (2009).
4) 222 characters from the CJKUI Ext D block.
Code points U+2B740 to U+2B81D. Unicode 6.0 (2010).
5) CJKUI Ext E block.
Coming soon
If the above is not spaghetti enough, take a look at known issues. Have fun =)
May be you would find a complete list through the CJK Unicode FAQ (which does include "Chinese, Japanese, and Korean" characters)
The "East Asian Script" document does mention:
Table 12-2. Blocks Containing Han Ideographs
Note: the block ranges can evolve over time: latest is in CJK Unified Ideographs.
See also Wikipedia: