How to know the number of characters in utf8 strin

i want to know is there a simple way to determine the number of characters in UTF8 string. For example, in windows it can be done by:

converting UTF8 string to wchar_t string
use wcslen function and get result

But I need more simpler and crossplatform solution.

Thanks in advance.

标签： c string utf-8 character-encoding

3条回答

Fickle 薄情

2楼-- · 2019-06-28 05:45

If the string is known to be valid UTF-8, simply take the length of the string in bytes, excluding bytes whose values are in the range 0x80-0xbf:

size_t i, cnt;
for (cnt=i=0; s[i]; i++) if (s[i]<0x80 || s[i]>0xbf) cnt++;

Note that s must point to an array of unsigned char in order for the comparisons to work.

0人赞添加讨论(0) 举报

爱情/是我丢掉的垃圾

3楼-- · 2019-06-28 05:49

The entire concept of a "number of characters" does not really apply to Unicode, as codes do not map 1:1 to glyphs. The method proposed by @borrible is fine if you want to establish storage requirements in uncompressed form, but that is all that it can tell you.

For example, there are code points like the "zero width space", which do not take up space on the screen when rendered, but occupy a code point, or modifiers for diacritics or vowels. So any statistic would have to be specific to the concrete application.

A proper Unicode renderer will have a function that can tell you how many pixels will be used for rendering a string if that information is what you're after.

0人赞添加讨论(0) 举报

小情绪 Triste *

4楼-- · 2019-06-28 06:04

UTF-8 characters are either single bytes where the left-most-bit is a 0 or multiple bytes where the first byte has left-most-bit 1..10... (with the number of 1s on the left 2 or more) followed by successive bytes of the form 10... (i.e. a single 1 on the left). Assuming that your string is well-formed you can loop over all the bytes and increment your "character count" every time you see a byte that is not of the form 10... - i.e. counting only the first bytes in all UTF-8 characters.

0人赞添加讨论(0) 举报

How to know the number of characters in utf8 strin

采纳回答

编辑标签

举报内容

检举类型

检举原因

检举说明(必填)

打开微信“扫一扫”，打开网页后点击屏幕右上角分享按钮

付费偷看金额在0.1-10元之间