Runtime error: pointer index expression with base

2020-07-30 00:10发布

问题:

The question is to sort the letters of a given string in the decreasing order of their frequencies.

Eg: If string = "tree" output = "eert" or "eetr" I used an unordered_map and counted the frequencies of each letter and added it to the resultant string in the decreasing order of frequencies.

This is what I have tried:

string frequencySort(string s1) {
    unordered_map<char,int> temp;
    for(char& c: s1)
        temp[c]++;
    string s = "";
    while(!temp.empty()) {
        int max = 0;
        char c='a';
        for(auto it:temp) {
            if(it.second > max) {
                c = it.first;
                max = it.second;
            }
        }
        for(int j=0;j<max;j++)
            s = s + c;
        temp.erase(temp.find(c));
    }
    return s;
}

My code is not working for large inputs. And changing int to long long does not make it work. So the maximum frequency is within INT_MAX. I get this error:

Runtime error: pointer index expression with base 0x000000000000 overflowed to 0xffffffffffffffff

I cannot paste the particular test case here as it exceeds the permissible body size for a question.

Any help is appreciated.

回答1:

There is nothing logically wrong in the code, but there are many inefficiencies that could make you run out of memory in a low-memory machine.

First you pass string by value:

string frequencySort(string s1) {

This makes a new copy of the string each call, wasting twice as much memory than necessary.

Instead, prefer:

string frequencySort(const string & s1) {

The repeated reallocation required for the string can cause fragmentation in the memory manager, and cause more rapid out-of-memory issues:

    for(int j=0;j<max;j++)
        s = s + c;

To minimize reallocation issues, use reserve

string s = "";
s.reserve(s1.length());

And the biggest performance issue:

        s = s + c;

The above code copies the same string again and again. Running in O(N2) and wrecking havoc on the heap with massive fragmentation.

There are also simple inefficiencies in the code that might have a big impact on runtime for large inputs, although they don't affect complexity. The use of unordered_map for such a small set (26 english letters) has a lot of time-overhead. It might be more efficient to use std::map in this case. For large inputs it is more efficient to hold an array

int map[256] = {0};

Unfortunately, for small inputs it might be slower. Also, this will not work so well for wide characters (where there are over 216 possible wide characters). But for ASCII this should work pretty well.

As a benchmark I ran the string that results from this command:

 perl -e 'print "abcdefghijklmnopqrstuvwxyza" x 1000000; print "y\n"'

which generates a string of size 26 million characters. The code with int map[256] completed in less than 4 seconds on my laptop.



标签: c++ algorithm