General strategies for memory/speed problems

2019-02-26 09:25发布

问题:

I have a c++ code which runs through about 200 ASCII files, does some basic data processing, and outputs a single ASCII file with (basically) all of the data.

The program runs very quickly at first, then slows down drastically part way through, perhaps slows gradually a little more, then procedes at a fairly slow pace through the rest. I.e. it gets through the first ~80 files in about 5 seconds, ~200 total files in about 50 seconds. Each file is basically the same.

I'm looking for suggestions on how to track down the problem or memory leak.


Some more details: At first I would fopen(FILE *outputFile, "w") at the beginning of my program, and fclose() at the end. the first ~40 files would take ~ 4 seconds; then about 1.5 minutes for ~200 files.

I thought maybe the output file was clogging the memory, so I changed the code to fopen(outputFile, "a") each iteration (i.e. each time I opened a new file), and fclose() eachtime I closed the input file... this increased the performance to ~50 seconds total, as mentioned above.

It seems strange that this 'fix' would help so noticeably, but not entirely.

Also, I'm not dynamically allocating any memory (no calls to 'new' or 'delete' or 'free' or whatever).... so I'm not even sure how I could have a memory leak.

Any help would be appreciated! Thanks!


Code:

vector<string> dirCon;
// Uses boost::filesystem to store every file in directory
bool retVal = FileSystem::getDirectoryContents(HOME_DIR+HISTORY_DIR, &dirCon, 2);

int counter = 0;
for(int i = 0; i < dirCon.size(); i++) { 
    // Create output file
    FILE *outFile;
    string outputFileName = HOME_DIR ... ;
    // open file as append "a"
    bool ifRet = initFile(outFile, outputFileName.c_str(), "a");
    if(!ifRet) {
        fprintf(stderr, "ERROR ... ");
        return false;
    }       

    // Get the topmost directory name
    size_t loc = dirCon.at(i).find_last_of("/");
    string dirName = dirCon.at(i).substr(loc+1, (dirCon.at(i).size()-(loc+1)));

    // Get the top directory content
    vector<string> subDirCon;
    bool subRetVal = FileSystem::getDirectoryContents(dirCon.at(i), &subDirCon);
    if(!subRetVal) { fprintf(stderr, "ERROR\n"); return false; }

    // Go through each file in directory, look for the one that matches
    for(int j = 0; j < subDirCon.size(); j++) {

        // Get filename
        loc = subDirCon.at(j).find_last_of("/");
        string fileName = subDirCon.at(j).substr(loc+1, (subDirCon.at(j).size()-(loc+1)));

        // If filename matches desired station, process and store
        if( fileName == string(dirName ...) ) {
            // Open File
            FILE *inFile;
            if(!initFile(inFile, subDirCon.at(j).c_str(), "r")) { 
                fprintf(stderr, "ERROR: ... !\n");
                break;
            }

            // Parse file line-by-line
            char str[TB_CHARLIMIT_LARGE];
            const char *delim = ",";
            while(true) {
                vector<string> splitString;
                fgets(str, TB_CHARLIMIT_LARGE, inFile);

                if(feof(inFile)) { break; }     // break at end of file
                removeEndLine(str);

                // If non-comment line, parse
                if(str[0] != COMCHAR){
                    string strString(str);
                    // remove end line char
                    strString.erase(std::remove(strString.begin(), strString.end(), '\n'), strString.end());
                    strcpy(str, strString.c_str());

                    char *temp = strtok(str,delim);
                    char *lastTemp;
                    while(temp != NULL) {
                        splitString.push_back(string(temp));
                        temp = strtok(NULL,delim);
                    }
                    if(splitString.size() > 0) { 
                        DateTime dtTemp(splitString.at(0));  
                        goodLines++;

                        /*  ... process splitString, use dtTemp ... */

                        // Output to file
                        fprintf(outFile, "%s\n", strFromStrVec(splitString).c_str());
                    }
                }
            } //while
            fclose(inFile); 
        }
    } //j
    cout << "GoodLines = " << goodLines << endl;

    fclose(outFile);
} // i

bool getDirectoryContents(const string dirName, vector<string> *conts) {
    path p(dirName);
    try {
        // Confirm Exists
        if(!exists(p)) {
            fprintf(stderr, "ERROR: '%s' does not exist!\n", dirName.c_str());
            return false;
        }

        // Confirm Directory
        if(!is_directory(p)) {
            return false;
        }

        conts->clear();

        // Store paths to sort later
        typedef vector<path> vec;
        vec v;

        copy(directory_iterator(p), directory_iterator(), back_inserter(v));

        sort(v.begin(), v.end()); 

        for(vec::const_iterator it(v.begin()), it_end(v.end()); it != it_end; ++it) {
            conts->push_back(it->string());
        }


    } catch(const filesystem_error& ex) {
        fprintf(stderr, "ERROR: '%s'!\n", ex.what());
        return false;
    }   

    return true;
}

回答1:

Without more information to go on, I'd guess that what you're dealing with is a Schlemiel the Painter's algorithm: (Original) (Wikipedia). They're incredibly easy to fall into doing string processing. Let me give you an example.

I want to read every line in a file, process each line somehow, run it through some intermediate processing. Then I want to gather up the the results, and maybe write it back to disk. Here's a way to do that. I make a single huge mistake that can be easy to miss:

// proc.cpp
class Foo
{
  public:
  std::string chew_on(std::string const& line_to_chew_on) {...}
  ...
};

Foo processor;
std::string buffer;

// Read/process
FILE *input=fopen(..., "r");
char linebuffer[1000+1];
for (char *line=fgets(linebuffer, 1000, input); line; 
     line=fgets(linebuffer, 1000, input) ) 
{
    buffer=buffer+processor.chew_on(line);  //(1)
}
fclose(input);

// Write
FILE *output=fopen(...,"w");
fwrite(buffer.data(), 1, buffer.size(), output);
fclose(output);

The problem here which can be easy to miss at first glance, is that each time line (1) is run, the entire contents of buffer is copied. If there are 1000 lines with 100 characters each, you end up spending time copying 100+200+300+400+....+100,000=5,050,000 byte copies to run this. Increase to 10,000 lines? 500,500,000. That paint can is getting further and further away.

In this particular example, the fix is easy. Line (1) should read:

    buffer.append(processor.chew_on(line)); // (2)

or equivalently: (thanks Matthieu M.):

    buffer += processor.chew_on(line);

This manages to help because (usually) std::string won't need to make a full copy of buffer to perform the append function, whereas in (1), we're insisting that a copy be made.

More generally, suppose (a) the processing you're doing keeps state, (b) you reference all or most of it often, and (c) that state grows over time. Then there's a fair to good chance that you've written a Θ(n2) time algorithm, which will exibit exactly the type of behavior you're talking about.


Edit

Of course, the stock answer to "why is my code slow?" is "run a profile." There are a number of tools and techniques for doing this. Some options include:

  • callgrind/kcachegrind (as suggested by David Schwartz)
  • Random Pausing (as suggested by Mike Dunlavey)
  • The GNU profiler, gprof
  • The GNU test coverage profiler, gcov
  • oprofile

    They've all got their strengths. "Random Pausing" is probably the simplest to implement, though it can be hard to interpret the results. 'gprof' and 'gcov' are basically useless on multithreaded programs. Callgrind is thorough but slow, and can sometimes play strange tricks on multithreaded programs. oprofile is fast, plays nicely with multithreaded programs, but can be difficult to use, and can miss things.

    However, if you're trying to profile a single threaded program, and are developing with the GNU toolchain, gprof can be a wonderful option. Take my proc.cpp, above. For purposes of demonstration, I'm going to profile an unoptimized run. First, I rebuild my program for profiling (adding -pg to the compile and link steps):

    $ g++ -O0 -g -pg -o proc.o -c proc.cpp
    $ g++ -pg -o proc proc.o
    

    I run the program once to create profiling information:

    ./proc
    

    In addition to doing whatever it would normally do, this run will create a file called 'gmon.out' in the current directory. Now, I run gprof to interpret the result:

    $ gprof ./proc
    Flat profile:
    
    Each sample counts as 0.01 seconds.
      %   cumulative   self              self     total           
     time   seconds   seconds    calls  ms/call  ms/call  name    
    100.50      0.01     0.01   234937     0.00     0.00  std::basic_string<...> std::operator+<...>(...)
      0.00      0.01     0.00   234937     0.00     0.00  Foo::chew_on(std::string const&)
      0.00      0.01     0.00        1     0.00    10.05  do_processing(std::string const&, std::string const&)
    ...
    

    Yes indeed, 100.5% of my program's time is spent in std::string operator+. Well, ok, up to some sampling error. (I'm running this in a VM ... it seems that the timing being captured by gprof is off. My program took much longer than 0.01 cumulative seconds to run...)

    For my very simple example, gcov is a little less instructive. But here's what it happens to show. First, compile and run for gcov:

    $ g++ -O0 -fprofile-arcs -ftest-coverage -o proc proc.cpp
    $ ./proc
    $ gcov ./proc
    ...
    

    This creates a bunch of files ending in .gcno, .gcda, .gcov in the current directory. The files in .gcov tell us how many times each line of code was executed during the run. So, in my example, my proc.cpp.gcov ends up looking like this:

            -:    0:Source:proc.cpp
            -:    0:Graph:proc.gcno
            -:    0:Data:proc.gcda
            -:    0:Runs:1
            -:    0:Programs:1
            -:    1:#include 
            -:    2:#include 
            -:    4:class Foo
            -:    5:{
            -:    6:  public:
       234937:    7:  std::string chew_on(std::string const& line_to_chew_on) {return line_to_chew_on;}
            -:    8:};
            -:    9:
            -:   10:
            -:   11:
            1:   12:int do_processing(std::string const& infile, std::string const& outfile)
            -:   13:{
            -:   14:  Foo processor;
            2:   15:  std::string buffer;
            -:   16:
            -:   17:  // Read/process
            1:   18:  FILE *input=fopen(infile.c_str(), "r");
            -:   19:  char linebuffer[1000+1];
       234938:   20:  for (char *line=fgets(linebuffer, 1000, input); line; 
            -:   21:       line=fgets(linebuffer, 1000, input) ) 
            -:   22:    {
       234937:   23:      buffer=buffer+processor.chew_on(line);  //(1)
            -:   24:    }
            1:   25:  fclose(input);
            -:   26:
            -:   27:  // Write
            1:   28:  FILE *output=fopen(outfile.c_str(),"w");
            1:   29:  fwrite(buffer.data(), 1, buffer.size(), output);
            1:   30:  fclose(output);
            1:   31:}
            -:   32:
            1:   33:int main()
            -:   34:{
            1:   35:  do_processing("/usr/share/dict/words","outfile");
            -:   36:}
    

    So from this, I'm going to have to conclude that the std::string::operator+ at line 23 (which is executed 234,937 times) is a potential cause of my program's slowness.

    As an aside, callgrind/kcachegrind work with multithreaded programs, and can provide much, much more information. For this program I run:

    g++ -O0 -o proc proc.cpp
    valgrind --tool=callgrind ./proc  # this takes forever to run
    kcachegrind callgrind.out.*
    

    And I find the following output, showing that what's really eating up my cycles is lots and lots of memory copies (99.4% of execution time spent in __memcpy_ssse3_back), which I can see all happen somewhere below line 23 in my source:



  • 回答2:

    Analyze your code with callgrind, part of the valgrind suite. You can graphically browse the results with kcachegrind. (Despite its name, it works on callgrind output too.) It's free and will give you awesome detail.

    You can also externally turn data collection off and on. So start with it off, wait until your program gets slow, turn it on during the problem time, then turn it off. You'll see where the CPU was going. If necessary, do the same thing in reverse watching only when it's fast and compare.

    Usually, the problem will stick out like a sore thumb.



    回答3:

    Could you share your program ?

    1. One thing to look for is whether you are using data structures that do not scale with increasing number of elements.

    e.g. using lists to hold a million elements would be extremely slow to traverse/ search (O(n)) as opposed to say using a binary search tree (nlog(n)) or a hashing (O(1)).

    2. You should look at whether you are holding on to the data at the end of each cycle (/burn/run) . Ideally you should release all the resources at the end of each run.

    3. Sounds like there may be a handle leak ?



    回答4:

    This is a total shot in the dark. You've got:

    bool getDirectoryContents(const string dirName, vector<string> *conts) {
        ...
        copy(directory_iterator(p), directory_iterator(), back_inserter(v));
    

    How does the performance change if you instead make that:

    bool getDirectoryContents(const string dirName, vector<string> *conts) {
        ...
        // note: preincrementing the iterator
        for (directory_iterator it((p)); it!=directory_iterator(); ++it) {
           v.push_back(*it);
        }
    

    My thought is that std::copy is specified to use postincrement. And boost::filesystem::directory_iterator is an InputIterator: it shouldn't really support postincrement. boost::filesystem::directory_iterator may not be happy being postincremented.