How can I determine the length(number of characters) in a std::wstring
?
Using myStr.length()
gives the byte size(I think) but its not the number of characters. Do I need to create my own function to find the number of characters or is there a native C++ way or a native WinAPI way?
If you want to know the length in
wchar_t
entities, usemyStr.length()
. If you want to know the size in Unicode codepoints you'll have to find a library that knows how to count those. You could also write one yourself - the rules for determining whether a codepoint encoded as UTF-16 uses one or two entities are not too hard, see http://en.wikipedia.org/wiki/Utf-16. To know if yourwchar_t
is 16 bits (vs. 32 bits) usesizeof(wchar_t) == 2
.std::wstring::length()
will give you the number of characters, where character is defined as the atomic unit of thewstring
object, i.e. awchar
. This is what the Standard means when it refers tocharacters
(see this post for some more details on the use of the word in the Standard).However, when it comes to Unicode characters, whether one
wchar
corresponds to one Unicode character depends on the encoding used inside thewstring
. If UTF-16 is used, which is often (but not necessarily) the case, onewchar
will correspond to one Unicode character only for the base multilingual plane (i.e. all character sets derived from ISO-8859 as well as most of the commonly used CJK characters, but not some of the more exotic (e.g. classical Chinese) characters)(*). If you want to get the character count right for all Unicode characters in that case, you need to use a Unicode-aware library (e.g. ICU), or code it yourself.(*)There are additional problems if combining characters are used, as @一二三 points out correctly. Counting those correctly is also best done using appropriate libraries.