How to implement a natural sort algorithm in c++?

2019-01-11 18:12发布

I'm sorting strings that are comprised of text and numbers. I want the sort to sort the number parts as numbers, not alphanumeric.

For example I want: abc1def, ..., abc9def, abc10def

instead of: abc10def, abc1def, ..., abc9def

Does anyone know an algorithm for this (in particular in c++)

Thanks

6条回答
时光不老,我们不散
2楼-- · 2019-01-11 18:44
// -1: s0 < s1; 0: s0 == s1; 1: s0 > s1
static int numericCompare(const string &s0, const string &s1) {
    size_t i = 0, j = 0;
    for (; i < s0.size() && j < s1.size();) {
        string t0(1, s0[i++]);
        while (i < s0.size() && !(isdigit(t0[0]) ^ isdigit(s0[i]))) {
            t0.push_back(s0[i++]);
        }
        string t1(1, s1[j++]);
        while (j < s1.size() && !(isdigit(t1[0]) ^ isdigit(s1[j]))) {
            t1.push_back(s1[j++]);
        }
        if (isdigit(t0[0]) && isdigit(t1[0])) {
            size_t p0 = t0.find_first_not_of('0');
            size_t p1 = t1.find_first_not_of('0');
            t0 = p0 == string::npos ? "" : t0.substr(p0);
            t1 = p1 == string::npos ? "" : t1.substr(p1);
            if (t0.size() != t1.size()) {
                return t0.size() < t1.size() ? -1 : 1;
            }
        }
        if (t0 != t1) {
            return t0 < t1 ? -1 : 1;
        }
    }
    return i == s0.size() && j == s1.size() ? 0 : i != s0.size() ? 1 : -1;
}

I am not very sure if it is you want, anyway, you can have a try:-)

查看更多
贪生不怕死
3楼-- · 2019-01-11 18:53

This is known as natural sorting. There's an algorithm here that looks promising.

Be careful of problems with non-ASCII characters (see Jeff's blog entry on the subject).

查看更多
混吃等死
4楼-- · 2019-01-11 18:56

I asked this exact question (although in Java) and got pointed to http://www.davekoelle.com/alphanum.html which has an algorithm and implementations of it in many languages.

查看更多
聊天终结者
5楼-- · 2019-01-11 19:04

To solve what is essentially a parsing problem a state machine (aka finite state automaton) is the way to go. Dissatisfied with the above solutions i wrote a simple one-pass early bail-out algorithm that beats C/C++ variants suggested above in terms of performance, does not suffer from numerical datatype overflow errors, and is easy to modify to add case insensitivity if required.

sources can be found here

查看更多
贪生不怕死
6楼-- · 2019-01-11 19:06

Several natural sort implementations for C++ are available. A brief review:

  • natural_sort<> - based on Boost.Regex.
    • In my tests, it's roughly 20 times slower than other options.
  • Dirk Jagdmann's alnum.hpp, based on Dave Koelle's alphanum algorithm
    • Potential integer overlow issues for values over MAXINT
  • Martin Pool's natsort - written in C, but trivially usable from C++.
    • The only C/C++ implementation I've seen to offer a case insensitive version, which would seem to be a high priority for a "natural" sort.
    • Like the other implementations, it doesn't actually parse decimal points, but it does special case leading zeroes (anything with a leading 0 is assumed to be a fraction), which is a little weird but potentially useful.
    • PHP uses this algorithm.
查看更多
唯我独甜
7楼-- · 2019-01-11 19:09

Partially reposting my another answer:

bool compareNat(const std::string& a, const std::string& b){
    if (a.empty())
        return true;
    if (b.empty())
        return false;
    if (std::isdigit(a[0]) && !std::isdigit(b[0]))
        return true;
    if (!std::isdigit(a[0]) && std::isdigit(b[0]))
        return false;
    if (!std::isdigit(a[0]) && !std::isdigit(b[0]))
    {
        if (a[0] == b[0])
            return compareNat(a.substr(1), b.substr(1));
        return (toUpper(a) < toUpper(b));
        //toUpper() is a function to convert a std::string to uppercase.
    }

    // Both strings begin with digit --> parse both numbers
    std::istringstream issa(a);
    std::istringstream issb(b);
    int ia, ib;
    issa >> ia;
    issb >> ib;
    if (ia != ib)
        return ia < ib;

    // Numbers are the same --> remove numbers and recurse
    std::string anew, bnew;
    std::getline(issa, anew);
    std::getline(issb, bnew);
    return (compareNat(anew, bnew));
}

toUpper() function:

std::string toUpper(std::string s){
    for(int i=0;i<(int)s.length();i++){s[i]=toupper(s[i]);}
    return s;
    }

Usage:

std::vector<std::string> str;
str.push_back("abc1def");
str.push_back("abc10def");
...
std::sort(str.begin(), str.end(), compareNat);
查看更多
登录 后发表回答