utf8 <-> utf16: codecvt poor performance

I'm looking onto some of my old (and exclusively win32 oriented) stuff and thinking about making it more modern/portable - i.e. reimplementing some widely reusable parts in C++11. One of these parts is convertin between utf8 and utf16. In Win32 API I'm using MultiByteToWideChar/WideCharToMultiByte, trying to port that stuff to C++11 using sample code from here: https://stackoverflow.com/a/14809553. The result is

Release build (compiled by MSVS 2013, run on Core i7 3610QM)

stdlib                   = 1587.2 ms
Win32                    =  127.2 ms

Debug build

stdlib                   = 5733.8 ms
Win32                    =  127.2 ms

The question is - is there something wrong with the code? If everything seems to be OK - is there some good reason for the such performance difference?

Test code is below:

#include <iostream>
#include <fstream>
#include <string>
#include <iterator>
#include <clocale>  
#include <codecvt> 

#define XU_BEGIN_TIMER(NAME)                       \
    {                                           \
        LARGE_INTEGER   __freq;                 \
        LARGE_INTEGER   __t0;                   \
        LARGE_INTEGER   __t1;                   \
        double          __tms;                  \
        const char*     __tname = NAME;         \
        char            __tbuf[0xff];           \
                                                \
        QueryPerformanceFrequency(&__freq);     \
        QueryPerformanceCounter(&__t0);         

#define XU_END_TIMER()                             \
        QueryPerformanceCounter(&__t1);         \
        __tms = (__t1.QuadPart - __t0.QuadPart) * 1000.0 / __freq.QuadPart; \
        sprintf_s(__tbuf, sizeof(__tbuf), "    %-24s = %6.1f ms\n", __tname, __tms ); \
        OutputDebugStringA(__tbuf);             \
        printf(__tbuf);                         \
    }   

std::string read_utf8() {
    std::ifstream infile("C:/temp/UTF-8-demo.txt");
    std::string fileData((std::istreambuf_iterator<char>(infile)),
                         std::istreambuf_iterator<char>());
    infile.close();

    return fileData;
}

void testMethod() {
    std::setlocale(LC_ALL, "en_US.UTF-8");
    std::string source = read_utf8();
    {
        std::string utf8;

        XU_BEGIN_TIMER("stdlib") {
            for( int i = 0; i < 1000; i++ ) {
                std::wstring_convert<std::codecvt_utf8_utf16<char16_t>, char16_t> convert2utf16;
                std::u16string utf16 = convert2utf16.from_bytes(source);

                std::wstring_convert<std::codecvt_utf8_utf16<char16_t>, char16_t> convert2utf8;
                utf8 = convert2utf8.to_bytes(utf16);
            }
        } XU_END_TIMER();

        FILE* output = fopen("c:\\temp\\utf8-std.dat", "wb");
        fwrite(utf8.c_str(), 1, utf8.length(), output);
        fclose(output);
    }

    char* utf8 = NULL;
    int cchA = 0;

    {
        XU_BEGIN_TIMER("Win32") {
            for( int i = 0; i < 1000; i++ ) {
                WCHAR* utf16 = new WCHAR[source.length() + 1];
                int cchW;
                utf8 = new char[source.length() + 1];

                cchW = MultiByteToWideChar(
                    CP_UTF8, 0, source.c_str(), source.length(),
                    utf16, source.length() + 1);

                cchA = WideCharToMultiByte(
                    CP_UTF8, 0, utf16, cchW,
                    utf8, source.length() + 1, NULL, false);

                delete[] utf16;
                if( i != 999 )
                    delete[] utf8;
            }
        } XU_END_TIMER();

        FILE* output = fopen("c:\\temp\\utf8-win.dat", "wb");
        fwrite(utf8, 1, cchA, output);
        fclose(output);

        delete[] utf8;
    }
}

回答1:

Win32's UTF8 transcode since Vista uses SSE internally to great effect, something very few other UTF transcoders do. I suspect it will be impossible to beat with even the most highly optimized portable code.

However, this number you've given for codecvt is simply exceptionally slow if it's taking over 10x the time, and suggests a naive implementation. While writing my own UTF-8 decoder, I was able to reach within 2-3x the perf of Win32. There's a lot of room for improvement here, but you'd need to custom implement a codecvt to get it.

回答2:

In my own testing, I found that the constructor call for wstring_convert has a massive overhead, at least on Windows. As other answers suggest, you'll probably struggle to beat the native Windows implementation, but try modifying your code to construct the converter outside of the loop. I expect you'll see an improvement of between 5x and 20x, particularly in a debug build.