With C++11, how can I, from an UTF-8 encoded std::string
, get the Unicode value of each character of the text into an uint32_t
?
Something like:
void f(const std::string &utf8_str)
{
for(???) {
uint32_t code = ???;
/* Do my stuff with the code... */
}
}
Does assuming the host system locale is UTF-8 helps? What standard library tools C++11 offers for the task?
You can simply convert the string into a UTF-32 encoded one, using the provided conversion facet and
std::wstring_convert
from<locale>
:Using
<utf8.h>
from http://utfcpp.sourceforge.net/ you could code:This is extracted from some code licensed under GPLv3 that I will release in a few weeks or months.