Getting the string length on UTF-8 in C? [closed]

2019-09-20 18:21发布

Can this be done using a method similar to this one:

As long as the current element of the string the user input via scanf is not \0, add one to the "length" int and then print out the length.

I would be very grateful if anybody could guide me through the least complex way possible as I am a beginner.

Thank you very much, have a good one!

标签： c string utf-8

1条回答

我想做一个坏孩纸

2楼-- · 2019-09-20 18:57

What do you mean by string length?

The number of bytes is easily obtained with strlen(s).

The number of code points encoded in UTF-8 can be computed by counting the number of single byte chars (range 1 to 127) and the number of leading bytes (range 0xC0 to 0xFF), ignoring continuation bytes (range 0x80 to 0xBF) and stopping at '\0'.

Here is a simple function to do this:

size_t count_utf8_code_points(const char *s) {
    size_t count = 0;
    while (*s) {
        count += (*s++ & 0xC0) != 0x80;
    }
    return count;
}

This function assumes that the contents of the array pointed to by s is properly encoded.

Also note that this will compute the number of code points, not the number of characters displayed, as some of these may be encoded using multiple combining code points, such as <LATIN CAPITAL LETTER A> followed by <COMBINING ACUTE ACCENT>.

0人赞添加讨论(0) 举报

Getting the string length on UTF-8 in C? [closed]

采纳回答

编辑标签

举报内容

检举类型

检举原因

检举说明(必填)

打开微信“扫一扫”，打开网页后点击屏幕右上角分享按钮

付费偷看金额在0.1-10元之间