templates and string literals and UNICODE

2020-02-11 07:12发布

问题:

NEW: Thank you everyone who helped me with this! The answer is marked below, and I've expanded on the answer with a functioning version in my question, below (q.v.):


I seem to be running into this situation a lot (while updating our string utilities library):

I need a way to have a template which works for both char and wchar_t, which uses various string-literals. Currently I'm finding this challenging because I don't know how to have a compile-time way to alter string literals to be narrow or wide character.

For consideration, take the following TCHAR based function:

// quote the given string in-place using the given quote character
inline void MakeQuoted(CString & str, TCHAR chQuote = _T('"'))
{
    if (str.IsEmpty() || str[0] != chQuote)
        str.Format(_T("%c%s%c"), chQuote, str, chQuote);
}

I want to template it instead:

// quote the given string in-place using the given quote character
template <typename CSTRING_T, typename CHAR_T>
inline void MakeQuoted(CSTRING_T & str, CHAR_T chQuote = '"')
{
    if (str.IsEmpty() || str[0] != chQuote)
        str.Format("%c%s%c", chQuote, str, chQuote);
}

Immediately we have a problem with the two string literals ('"', and "%c%s%c").

If the above is invoked for CSTRING_T = CStringA, CHAR_T = char, then the above literals are fine. But if it is invoked for CStringW and wchar_t, then I really need (L'"', and L"%c%c%c").

So I need some way to do something like:

template <typename CSTRING_T, typename CHAR_T>
inline void MakeQuoted(CSTRING_T & str, CHAR_T chQuote = Literal<CHAR_T>('"'))
{
    if (str.IsEmpty() || str[0] != chQuote)
        str.Format(Literal<CHAR_T>("%c%s%c"), chQuote, str, chQuote);
}

And that's where I am lost: What in the world can I do to make Literal(string-or-character-literal) that actually results in L"string" or "string" depending on CHAR_T?

Edit: There are over a hundred functions, many of them more complex with more string-literals in them, that need to be available both for narrow and wide strings. Short of copying every such function and then editing each one to either be wide or narrow, surely there is a technique that would allow a single definition that varies by CHAR_T?


I'm giving the answer to the hybrid macro + template that Mark Ransom supplied, but I wanted to include a more complete solution (for anyone who cared), so here it is:

// we supply a few helper constructs to make templates easier to write
// this is sort of the dark underbelly of template writing
// to help make the c++ compiler slightly less obnoxious

// generates the narrow or wide character literal depending on T
// usage: LITERAL(charT, "literal text") or LITERAL(charT, 'c')
#define LITERAL(T,x) template_details::literal_traits<typename T>::choose(x, L##x)

namespace template_details {

    // Literal Traits uses template specialization to achieve templated narrow or wide character literals for templates
    // the idea came from me (Steven S. Wolf), and the implementation from Mark Ransom on stackoverflow (http://stackoverflow.com/questions/4261673/templates-and-string-literals-and-unicode)
    template<typename T>
    struct literal_traits
    {
        typedef char char_type;
        static const char * choose(const char * narrow, const wchar_t * wide) { return narrow; }
        static char choose(const char narrow, const wchar_t wide) { return narrow; }
    };

    template<>
    struct literal_traits<wchar_t>
    {
        typedef wchar_t char_type;
        static const wchar_t * choose(const char * narrow, const wchar_t * wide) { return wide; }
        static wchar_t choose(const char narrow, const wchar_t wide) { return wide; }
    };

} // template_details

In addition, I created some helpers to make writing templates that utilized this concept in conjunction with CStringT<> a bit easier / nicer to read & comprehend:

// generates the correct CString type based on char_T
template <typename charT>
struct cstring_type
{
    //  typedef CStringT< charT, ATL::StrTraitATL< charT, ATL::ChTraitsCRT< charT > > > type;
    // generate a compile time error if we're invoked on a charT that doesn't make sense
};

template <>
struct cstring_type<char>
{
    typedef CStringA type;
};

template <>
struct cstring_type<wchar_t>
{
    typedef CStringW type;
};

#define CSTRINGTYPE(T) typename cstring_type<T>::type

// returns an instance of a CStringA or CStringW based on the given char_T
template <typename charT>
inline CSTRINGTYPE(charT) make_cstring(const charT * psz)
{
    return psz;
}

// generates the character type of a given CStringT<>
#define CSTRINGCHAR(T) typename T::XCHAR

With the above, it is possible to write templates which generate the correct CString variety based on CStringT<> or char/wchar_t arguments. For example:

// quote the given string in-place using the given quote character
template <typename cstringT>
inline void MakeQuoted(cstringT & str, CSTRINGCHAR(cstringT) chQuote = LITERAL(CSTRINGCHAR(cstringT), '"'))
{
    if (str.IsEmpty() || str[0] != chQuote)
        str.Format(LITERAL(cstringT::XCHAR, "%c%s%c"), chQuote, str, chQuote);
}

// return a quoted version of the given string
template <typename cstringT>
inline cstringT GetQuoted(cstringT str, CSTRINGCHAR(cstringT) chQuote = LITERAL(CSTRINGCHAR(cstringT), '"'))
{
    MakeQuoted(str, chQuote);
    return str;
}

回答1:

The concept is to use a macro to generate both forms of the literal, char and wchar_t, then let a template function choose which one is appropriate for the context.

Remember that template functions don't actually generate any code until you have other code that makes a call to them. Most of the time this doesn't matter, but it would for a library.

This code is untested, but I believe it will work.

#define LITERAL(T,x) CString_traits<T>::choose(x, L##x)

template<typename T>
struct CString_traits
{
    typedef char char_type;
    static const char * choose(const char * narrow, const wchar_t * wide) { return narrow; }
    static char choose(char narrow, wchar_t wide) { return narrow; }
};

template<>
struct CString_traits<CStringW>
{
    typedef wchar_t char_type;
    static const wchar_t * choose(const char * narrow, const wchar_t * wide) { return wide; }
    static wchar_t choose(char narrow, wchar_t wide) { return wide; }
};

template <typename T>
inline void MakeQuoted(T & str, CString_traits<T>::char_type chQuote = LITERAL(T,'"'))
{
    if (str.IsEmpty() || str[0] != chQuote)
        str.Format(LITERAL(T,"%c%s%c"), chQuote, str, chQuote);
}


回答2:

This piece is my own personal tiny little bit of genius.

#include <malloc.h>
template<typename to, int size> to* make_stack_temporary(const char(&lit)[size], to* memory = (to*)_alloca(sizeof(to)*size)) {
    for(int i = 0; i < size; i++)
        memory[i] = lit[i];
    return memory;
}

When you use alloca in a default argument, it's actually allocated off the caller's stack, allowing you to return arrays without resorting to the heap. No dynamic allocation, no memory freeing. _alloca is a CRT function provided by MSVC, so I don't give any portability guarantees - but if you're using ATL that's likely no problem anyway. Of course, this also means that the pointer cannot be held past the calling function, but it should suffice for temporary uses like format strings. There are also some caveats to do with exception handling that you are unlikely to come across (check MSDN for details), and of course, it will only work for characters which have the same binary representation, which to my knowledge is every character you can put in a narrow string literal. I appreciate that this only solves a subset of the actual problems you may have encountered, but it's a far superior solution to that specific subset than the macro, or specifying every literal twice, etc.

You can also use the definitely uglier but more behaviourally consistent aggregate initialization.

template<typename T> some_type some_func() {
    static const T array[] = { 'a', ' ', 's', 't', 'r', 'i', 'n', 'g', ' ', 'l', 'i', 't', 'e', 'r', 'a', 'l', '\0' };
}

In C++0x with variadic templates, it may be possible for this solution to not suck. I'm CLOSE to a better solution which is C++03, but don't hold your breath.

Edit: You can do this, which imo is the best solution, still involves some messing around.

#include <iostream>
#include <array>
#include <string>

struct something {
    static const char ref[];
};

const char something::ref[] = "";

template<int N, const char(*t_ref)[N], typename to> struct to_literal {
private:
    static to hidden[N];
public:
    to_literal() 
    : ref(hidden) {
        for(int i = 0; i < N; i++)
            hidden[i] = (*t_ref)[i];
    }
    const to(&ref)[N];
};
template<int N, const char(*t_ref)[N], typename to> to to_literal<N, t_ref, to>::hidden[];

template<int N, const char(&ref)[N], typename to> const to* make_literal() {
    return to_literal<N, &ref, to>().ref;
}

int main() {
    std::wcout << make_literal<sizeof(something::ref), something::ref, wchar_t>();
    std::wcin.get();
}

You have to go through every literal and make it a static member of a struct, then reference it, but it works much better.



回答3:

You don't need to use templates for something like this, considering there's only two ways to use MakeQuoted(). You can use function overloading for the same purpose:

inline void MakeQuoted(CStringA& str, char chQuote = '"') 
{ 
    if (str.IsEmpty() || str[0] != chQuote) 
        str.Format("%c%s%c", chQuote, str, chQuote); 
} 


inline void MakeQuoted(CStringW& str, wchar_t chQuote = L'"') 
{ 
    if (str.IsEmpty() || str[0] != chQuote) 
        str.Format(L"%c%s%c", chQuote, str, chQuote); 
} 

Surely this is the easiest way to do it without having to use macros, assuming that's your reason for attempting a template-based solution with your string utilities library.

You can factor out common functionality for long and complicated functions:

template<typename CStrT, typename CharT>
inline void MakeQuotedImpl(CStrT& str, CharT chQuote,
    const CharT* literal)
{
    if (str.IsEmpty() || str[0] != chQuote) 
        str.Format(literal, chQuote, str, chQuote); 

}

inline void MakeQuoted(CStringA& str, char chQuote = '"') 
{ 
    MakeQuotedImpl(str, chQuote, "%c%s%c");
} 


inline void MakeQuoted(CStringW& str, wchar_t chQuote = L'"') 
{
    MakeQuotedImpl(str, chQuote, L"%c%s%c");
} 


回答4:

I have a similar situation. I have made 1 source-code file and a header-file (of course) which I exclude from building. Then created 2 other source-files which contain the original source via an #include directive. In one file I #define UNICODE (if not already defined) before the include. In the other file I #undef UNICODE (if defined). the source file contains a few static structures and a number of functions ,which are identical (in text) for both sets of char (not when compiled). If every function has either wchar_t or char as a parameter results this method in 2 sets of overloaded functions or 2 sets of differently named functions (depends on how the header file is written ,take tchar.h as example). Now both UNICODE and ANSI versions of the functions are available for an application and if the header-file is correctly written also default version for TCHAR. If you wish I can ellaborated on it ,just say so.



回答5:

I believe you want the TEXT MFC macro:

TCHAR* psz = TEXT("Hello, generic string");


回答6:

You can use template partial specialization for MarkQuoted, and quote based on the type.



回答7:

OK, so, if you really want to template this, I think the best thing I've been able to come up with is a templated class that stores your literals, based on this discussion. Something like this:

template <typename T> class Literal;
template <> class Literal<char>
{
public:
    static const char Quote = '"';
};
template <> class Literal<wchar_t>
{
public:
    static const wchar_t Quote = L'"';
};

Then, you'd use Literal<CHAR_T>::Quote in your non-specialized but templated functions. Kinda messy, I guess, but it has the benefit of leaving your function logic unduplicated and gives you templated string literals.