locale-dependent ordering for std::string

2019-02-12 17:20发布

I am trying to compare std::strings in a locale-dependent manner.

For ordinary C-style strings, I've found strcoll, which does exactly what I want, after doing std::setlocale

#include <iostream>
#include <locale>
#include <cstring>

bool cmp(const char* a, const char* b)
{
    return strcoll(a, b) < 0;
}

int main()
{
    const char* s1 = "z", *s2 = "å", *s3 = "ä", *s4 = "ö";

    std::cout << (cmp(s1,s2) && cmp(s2,s3) && cmp(s3,s4)) << "\n"; //Outputs 0
    std::setlocale(LC_ALL, "sv_SE.UTF-8");
    std::cout << (cmp(s1,s2) && cmp(s2,s3) && cmp(s3,s4)) << "\n"; //Outputs 1, like it should

    return 0;
}

However, I'd like to have this behaviour for std::string as well. I could just overload operator< to do something like

bool operator<(const std::string& a, const std::string& b)
{
    return strcoll(a.c_str(), b.c_str());
}

but then I'd have to worry about code using std::less and std::string::compare, so it doesn't feel right.

Is there a way to make this kind of collation work for strings in a seamless manner?

4条回答
一夜七次
2楼-- · 2019-02-12 17:49

The C++ library provides the collate facet to do locale-specific collation.

查看更多
Evening l夕情丶
3楼-- · 2019-02-12 17:50

After a bit of searching around I realized that one way to do it could be to overload the std::basic_string template to make a new, localized string class.

There is probably a gazillion bugs in this, but as a proof of concept:

#include <iostream>
#include <locale>
#include <string>

struct localed_traits: public std::char_traits<wchar_t>
{
    static bool lt(wchar_t a, wchar_t b)
    {
        const std::collate<wchar_t>& coll =
            std::use_facet< std::collate<wchar_t> >(std::locale());
        return coll.compare(&a, &a+1, &b, &b+1) < 0;
    }

    static int compare(const wchar_t* a, const wchar_t* b, size_t n)
    {
        const std::collate<wchar_t>& coll =
            std::use_facet< std::collate<wchar_t> >(std::locale());
        return coll.compare(a, a+n, b, b+n);
    }
};

typedef std::basic_string<wchar_t, localed_traits> localed_string;

int main()
{
    localed_string s1 = L"z", s2 = L"å", s3 = L"ä", s4 = L"ö";

    std::cout << (s1 < s2 && s2 < s3 && s3 < s4 ) << "\n"; //Outputs 0
    std::locale::global(std::locale("sv_SE.UTF-8"));
    std::cout << (s1 < s2 && s2 < s3 && s3 < s4 ) << "\n"; //Outputs 1

    return 0;
}

Howerver, it doesn't seem to work if you base it on char instead of wchar_t and I have no idea why...

查看更多
你好瞎i
4楼-- · 2019-02-12 17:52

In C++ you need to use the standard collate facet. Check it out.

查看更多
爱情/是我丢掉的垃圾
5楼-- · 2019-02-12 17:56

operator() of std::locale is just what you are searching. To get the current global locale, just use the default constructor.

查看更多
登录 后发表回答