How to convert unicode code points to utf-8 in c++

2020-03-05 03:30发布

I have an array consisting of unicode code points

unsigned short array[3]={0x20ac,0x20ab,0x20ac};

I just want this to be converted as utf-8 to write into file byte by byte using C++.

Example: 0x20ac should be converted to e2 82 ac.

or is there any other method that can directly write unicode characters in file.

7条回答
Deceive 欺骗
2楼-- · 2020-03-05 03:47

This code uses WideCharToMultiByte (I assume that you are using Windows):

unsigned short wide_str[3] = {0x20ac, 0x20ab, 0x20ac};
int utf8_size = WideCharToMultiByte(CP_UTF8, 0, wide_str, 3, NULL, 0, NULL, NULL) + 1;
char* utf8_str = calloc(utf8_size);
WideCharToMultiByte(CP_UTF8, 0, wide_str, 3, utf8_str, utf8_size, NULL, NULL);

You need to call it twice: first time to get number of output bytes, and second time to actually convert it. If you know output buffer size, you may skip first call. Or, you can simply allocate buffer 2x larger than original + 1 byte (for your case it means 12+1 bytes) - it should be always enough.

查看更多
forever°为你锁心
3楼-- · 2020-03-05 03:53

The term Unicode refers to a standard for encoding and handling of text. This incorporates encodings like UTF-8, UTF-16, UTF-32, UCS-2, ...

I guess you are programming in a Windows environment, where Unicode typically refers to UTF-16.

When working with Unicode in C++, I would recommend the ICU library.

If you are programming on Windows, don't want to use an external library, and have no constraints regarding platform dependencies, you can use WideCharToMultiByte.

Example for ICU:

#include <iostream>
#include <unicode\ustream.h>

using icu::UnicodeString;

int main(int, char**) {
    //
    // Convert from UTF-16 to UTF-8
    //
    std::wstring utf16 = L"foobar";
    UnicodeString str(utf16.c_str());
    std::string utf8;
    str.toUTF8String(utf8);

    std::cout << utf8 << std::endl;
}

To do exactly what you want:

// Assuming you have ICU\include in your include path
// and ICU\lib(64) in your library path.
#include <iostream>
#include <fstream>
#include <unicode\ustream.h>
#pragma comment(lib, "icuio.lib")
#pragma comment(lib, "icuuc.lib")

void writeUtf16ToUtf8File(char const* fileName, wchar_t const* arr, size_t arrSize) {
    UnicodeString str(arr, arrSize);
    std::string utf8;
    str.toUTF8String(utf8);

    std::ofstream out(fileName, std::ofstream::binary);
    out << utf8;
    out.close();
}
查看更多
太酷不给撩
4楼-- · 2020-03-05 03:54

Finally! With C++11!

#include <string>
#include <locale>
#include <codecvt>
#include <cassert>

int main()
{
    std::wstring_convert<std::codecvt_utf8<char32_t>, char32_t> converter;
    std::string u8str = converter.to_bytes(0x20ac);
    assert(u8str == "\xe2\x82\xac");
}
查看更多
Juvenile、少年°
5楼-- · 2020-03-05 03:55

Following code may help you,

#include <atlconv.h>
#include <atlstr.h>

#define ASSERT ATLASSERT

int main()
{
    const CStringW unicode1 = L"\x0391 and \x03A9"; // 'Alpha' and 'Omega'

    const CStringA utf8 = CW2A(unicode1, CP_UTF8);

    ASSERT(utf8.GetLength() > unicode1.GetLength());

    const CStringW unicode2 = CA2W(utf8, CP_UTF8);

    ASSERT(unicode1 == unicode2);
}
查看更多
家丑人穷心不美
6楼-- · 2020-03-05 03:57
Anthone
7楼-- · 2020-03-05 04:00

Iconv is a popular library used on many platforms.

查看更多
登录 后发表回答