Is there a way to check whether unicode text is in

2019-01-07 15:07发布

I'll be getting text from a user that I need to validate is a Chinese character.

Is there any way I can check this?

8条回答
聊天终结者
2楼-- · 2019-01-07 15:37

This worked for me:

var charArray = text.ToCharArray();
var isChineseTextPresent = false;


foreach (var character in charArray)
{
    var cat = char.GetUnicodeCategory(character);


    if (cat != UnicodeCategory.OtherLetter)
    {
        continue;
    }


    isChineseTextPresent = true;
    break;
}
查看更多
干净又极端
3楼-- · 2019-01-07 15:38

According to the information provided here in unicode website you can find the block of Chinese or any other language and then implement a parser to check if a word is in the range or no. just like

public bool IsChinese(string text)
{
    return text.Any(c => c >= 0x20000 && c <= 0xFA2D);
}

Note that

As a handy reference, the Unicode Consortium here provides a search interface to the Unicode Hàn (漢) Database (Unihan).

The database link I'd provided above is showing you the characters

查看更多
叛逆
4楼-- · 2019-01-07 15:38

According to the wikipedia (https://en.wikipedia.org/wiki/CJK_Compatibility) there are several character code diapasons. Here is my approach to detect Chinese characters based on link above (code in F#, but it can be easily converted)

 let isChinese(text: string) = 
            text |> Seq.exists (fun c -> 
                let code = int c
                (code >= 0x4E00 && code <= 0x9FFF) ||
                (code >= 0x3400 && code <= 0x4DBF) ||
                (code >= 0x3400 && code <= 0x4DBF) ||
                (code >= 0x20000 && code <= 0x2CEAF) ||
                (code >= 0x2E80 && code <= 0x31EF) ||
                (code >= 0xF900 && code <= 0xFAFF) ||
                (code >= 0xFE30 && code <= 0xFE4F) ||
                (code >= 0xF2800 && code <= 0x2FA1F) 
                )
查看更多
唯我独甜
5楼-- · 2019-01-07 15:38

in unicode, chinese, japan, and Korean characters are encoded together.

visit this FAQ: http://www.unicode.org/faq/han_cjk.html

chinese character are distributed in serveral blocks.

visit this wiki: https://en.wikipedia.org/wiki/CJK_Unified_Ideographs

You will find there are serveral cjk character charts in unicode website.

For simplicity, You can just use chinese character minimum and maximum range:

0x4e00 and 0x2fa1f to check.

查看更多
冷血范
6楼-- · 2019-01-07 15:45

You can use regular expression to match with Supported Named Blocks:

private static readonly Regex cjkCharRegex = new Regex(@"\p{IsCJKUnifiedIdeographs}");
public static bool IsChinese(this char c)
{
    return cjkCharRegex.IsMatch(c.ToString());
}

Then, you can use:

if (sometext.Any(z=>z.IsChinese()))
     DoSomething();
查看更多
孤傲高冷的网名
7楼-- · 2019-01-07 15:48

Just check the characters to see if the codepoints are in the desired range(s). For exampe, see this question:

What's the complete range for Chinese characters in Unicode?

查看更多
登录 后发表回答