What's the complete range for Chinese characte

2018-12-31 18:31发布

U+4E00..U+9FFF is part of the complete set,but not all

标签: unicode cjk
4条回答
还给你的自由
2楼-- · 2018-12-31 19:12

The Unicode code blocks that the others answers gave certainly cover most of the Chinese Unicode characters, but check out some of these other code blocks, too.

CJK_UNIFIED_IDEOGRAPHS
CJK_UNIFIED_IDEOGRAPHS_EXTENSION_A
CJK_UNIFIED_IDEOGRAPHS_EXTENSION_B
CJK_UNIFIED_IDEOGRAPHS_EXTENSION_C
CJK_UNIFIED_IDEOGRAPHS_EXTENSION_D
CJK_UNIFIED_IDEOGRAPHS_EXTENSION_E
CJK_COMPATIBILITY
CJK_COMPATIBILITY_FORMS
CJK_COMPATIBILITY_IDEOGRAPHS
CJK_COMPATIBILITY_IDEOGRAPHS_SUPPLEMENT
CJK_RADICALS_SUPPLEMENT
CJK_STROKES
CJK_SYMBOLS_AND_PUNCTUATION
ENCLOSED_CJK_LETTERS_AND_MONTHS
ENCLOSED_IDEOGRAPHIC_SUPPLEMENT
KANGXI_RADICALS
IDEOGRAPHIC_DESCRIPTION_CHARACTERS

See my fuller discussion here. And this site is convenient for browsing Unicode.

查看更多
谁念西风独自凉
3楼-- · 2018-12-31 19:16

The exact ranges for Chinese characters (except the extensions) are [\u2E80-\u2FD5\u3190-\u319f\u3400-\u4DBF\u4E00-\u9FCC].

  1. [\u2e80-\u2fd5]

CJK Radicals Supplement is a Unicode block containing alternative, often positional, forms of the Kangxi radicals. They are used headers in dictionary indices and other CJK ideograph collections organized by radical-stroke.

  1. [\u3190-\u319f]

Kanbun is a Unicode block containing annotation characters used in Japanese copies of classical Chinese texts, to indicate reading order.

  1. [\u3400-\u4DBF]

CJK Unified Ideographs Extension-A is a Unicode block containing rare Han ideographs.

  1. [\u4E00-\u9FCC]

CJK Unified Ideographs is a Unicode block containing the most common CJK ideographs used in modern Chinese and Japanese.

For the details please refer to here, and the extensions are provided in other answers.

查看更多
公子世无双
4楼-- · 2018-12-31 19:30

Unicode currently has 74605 CJK characters. CJK characters not only includes characters used by Chinese, but also Japanese Kanji, Korean Hanja, and Vietnamese Chu Nom. Some CJK characters are not Chinese characters.

1) 20941 characters from the CJK Unified Ideographs block.

Code points U+4E00 to U+9FCC.

  1. U+4E00 - U+62FF
  2. U+6300 - U+77FF
  3. U+7800 - U+8CFF
  4. U+8D00 - U+9FCC

2) 6582 characters from the CJKUI Ext A block.

Code points U+3400 to U+4DB5. Unicode 3.0 (1999).

3) 42711 characters from the CJKUI Ext B block.

Code points U+20000 to U+2A6D6. Unicode 3.1 (2001).

  1. U+20000 - U+215FF
  2. U+21600 - U+230FF
  3. U+23100 - U+245FF
  4. U+24600 - U+260FF
  5. U+26100 - U+275FF
  6. U+27600 - U+290FF
  7. U+29100 - U+2A6DF

3) 4149 characters from the CJKUI Ext C block.

Code points U+2A700 to U+2B734. Unicode 5.2 (2009).

4) 222 characters from the CJKUI Ext D block.

Code points U+2B740 to U+2B81D. Unicode 6.0 (2010).

5) CJKUI Ext E block.

Coming soon

If the above is not spaghetti enough, take a look at known issues. Have fun =)

查看更多
柔情千种
5楼-- · 2018-12-31 19:36

May be you would find a complete list through the CJK Unicode FAQ (which does include "Chinese, Japanese, and Korean" characters)

The "East Asian Script" document does mention:

Blocks Containing Han Ideographs

Han ideographic characters are found in five main blocks of the Unicode Standard, as shown in Table 12-2

Table 12-2. Blocks Containing Han Ideographs

Block                                   Range       Comment
CJK Unified Ideographs                  4E00-9FFF   Common
CJK Unified Ideographs Extension A      3400-4DBF   Rare
CJK Unified Ideographs Extension B      20000-2A6DF Rare, historic
CJK Unified Ideographs Extension C      2A700–2B73F Rare, historic
CJK Unified Ideographs Extension D      2B740–2B81F Uncommon, some in current use
CJK Unified Ideographs Extension E      2B820–2CEAF Rare, historic
CJK Compatibility Ideographs            F900-FAFF   Duplicates, unifiable variants, corporate characters
CJK Compatibility Ideographs Supplement 2F800-2FA1F Unifiable variants

Note: the block ranges can evolve over time: latest is in CJK Unified Ideographs.

See also Wikipedia:

查看更多
登录 后发表回答