Convert string from UTF-8 to ISO-8859-1

I'm trying to convert a UTF-8 string to a ISO-8859-1 char* for use in legacy code. The only way I'm seeing to do this is with iconv.

I would definitely prefer a completely string-based C++ solution then just call .c_str() on the resulting string.

How do I do this? Code example if possible, please. I'm fine using iconv if it is the only solution you know.

标签： c++ utf-8 iso-8859-1 iconv

3条回答

傲

2楼-- · 2019-04-06 23:11

Alfs suggestion implemented in C++11

#include <string>
#include <codecvt>
#include <algorithm>
#include <iterator>
auto i = u8"H€llo Wørld";
std::wstring_convert<std::codecvt_utf8<wchar_t>> utf8;
auto wide = utf8.from_bytes(i);
std::string out;
out.reserve(wide.length());
std::transform(wide.cbegin(), wide.cend(), std::back_inserter(out),
           [](const wchar_t c) { return (c <= 255) ? c : '?'; });
// out now contains "H?llo W\xf8rld"

0人赞添加讨论(0) 举报

beautiful°

3楼-- · 2019-04-06 23:27

First convert UTF-8 to 32-bit Unicode.

Then keep the values that are in the range 0 through 255.

Those are the Latin-1 code points, and for other values, decide if you want to treat that as an error or perhaps replace with code point 127 (my fav, the ASCII "del") or question mark or something.

The C++ standard library defines a std::codecvt specialization that can be used,

template<>
codecvt<char32_t, char, mbstate_t>

C++11 §22.4.1.4/3: “the specialization codecvt <char32_t, char, mbstate_t> converts between the UTF-32 and UTF-8 encoding schemes”

0人赞添加讨论(0) 举报

不美不萌又怎样

4楼-- · 2019-04-06 23:28

I'm going to modify my code from another answer to implement the suggestion from Alf.

std::string UTF8toISO8859_1(const char * in)
{
    std::string out;
    if (in == NULL)
        return out;

    unsigned int codepoint;
    while (*in != 0)
    {
        unsigned char ch = static_cast<unsigned char>(*in);
        if (ch <= 0x7f)
            codepoint = ch;
        else if (ch <= 0xbf)
            codepoint = (codepoint << 6) | (ch & 0x3f);
        else if (ch <= 0xdf)
            codepoint = ch & 0x1f;
        else if (ch <= 0xef)
            codepoint = ch & 0x0f;
        else
            codepoint = ch & 0x07;
        ++in;
        if (((*in & 0xc0) != 0x80) && (codepoint <= 0x10ffff))
        {
            if (codepoint <= 255)
            {
                out.append(1, static_cast<char>(codepoint));
            }
            else
            {
                // do whatever you want for out-of-bounds characters
            }
        }
    }
    return out;
}

Invalid UTF-8 input results in dropped characters.

0人赞添加讨论(0) 举报

Convert string from UTF-8 to ISO-8859-1

采纳回答

编辑标签

举报内容

检举类型

检举原因

检举说明(必填)

打开微信“扫一扫”，打开网页后点击屏幕右上角分享按钮

付费偷看金额在0.1-10元之间