I have a c++ code which runs through about 200 ASCII files, does some basic data processing, and outputs a single ASCII file with (basically) all of the data.
The program runs very quickly at first, then slows down drastically part way through, perhaps slows gradually a little more, then procedes at a fairly slow pace through the rest. I.e. it gets through the first ~80 files in about 5 seconds, ~200 total files in about 50 seconds. Each file is basically the same.
I'm looking for suggestions on how to track down the problem or memory leak.
Some more details: At first I would fopen(FILE *outputFile, "w") at the beginning of my program, and fclose() at the end. the first ~40 files would take ~ 4 seconds; then about 1.5 minutes for ~200 files.
I thought maybe the output file was clogging the memory, so I changed the code to fopen(outputFile, "a") each iteration (i.e. each time I opened a new file), and fclose() eachtime I closed the input file... this increased the performance to ~50 seconds total, as mentioned above.
It seems strange that this 'fix' would help so noticeably, but not entirely.
Also, I'm not dynamically allocating any memory (no calls to 'new' or 'delete' or 'free' or whatever).... so I'm not even sure how I could have a memory leak.
Any help would be appreciated! Thanks!
Code:
vector<string> dirCon;
// Uses boost::filesystem to store every file in directory
bool retVal = FileSystem::getDirectoryContents(HOME_DIR+HISTORY_DIR, &dirCon, 2);
int counter = 0;
for(int i = 0; i < dirCon.size(); i++) {
// Create output file
FILE *outFile;
string outputFileName = HOME_DIR ... ;
// open file as append "a"
bool ifRet = initFile(outFile, outputFileName.c_str(), "a");
if(!ifRet) {
fprintf(stderr, "ERROR ... ");
return false;
}
// Get the topmost directory name
size_t loc = dirCon.at(i).find_last_of("/");
string dirName = dirCon.at(i).substr(loc+1, (dirCon.at(i).size()-(loc+1)));
// Get the top directory content
vector<string> subDirCon;
bool subRetVal = FileSystem::getDirectoryContents(dirCon.at(i), &subDirCon);
if(!subRetVal) { fprintf(stderr, "ERROR\n"); return false; }
// Go through each file in directory, look for the one that matches
for(int j = 0; j < subDirCon.size(); j++) {
// Get filename
loc = subDirCon.at(j).find_last_of("/");
string fileName = subDirCon.at(j).substr(loc+1, (subDirCon.at(j).size()-(loc+1)));
// If filename matches desired station, process and store
if( fileName == string(dirName ...) ) {
// Open File
FILE *inFile;
if(!initFile(inFile, subDirCon.at(j).c_str(), "r")) {
fprintf(stderr, "ERROR: ... !\n");
break;
}
// Parse file line-by-line
char str[TB_CHARLIMIT_LARGE];
const char *delim = ",";
while(true) {
vector<string> splitString;
fgets(str, TB_CHARLIMIT_LARGE, inFile);
if(feof(inFile)) { break; } // break at end of file
removeEndLine(str);
// If non-comment line, parse
if(str[0] != COMCHAR){
string strString(str);
// remove end line char
strString.erase(std::remove(strString.begin(), strString.end(), '\n'), strString.end());
strcpy(str, strString.c_str());
char *temp = strtok(str,delim);
char *lastTemp;
while(temp != NULL) {
splitString.push_back(string(temp));
temp = strtok(NULL,delim);
}
if(splitString.size() > 0) {
DateTime dtTemp(splitString.at(0));
goodLines++;
/* ... process splitString, use dtTemp ... */
// Output to file
fprintf(outFile, "%s\n", strFromStrVec(splitString).c_str());
}
}
} //while
fclose(inFile);
}
} //j
cout << "GoodLines = " << goodLines << endl;
fclose(outFile);
} // i
bool getDirectoryContents(const string dirName, vector<string> *conts) {
path p(dirName);
try {
// Confirm Exists
if(!exists(p)) {
fprintf(stderr, "ERROR: '%s' does not exist!\n", dirName.c_str());
return false;
}
// Confirm Directory
if(!is_directory(p)) {
return false;
}
conts->clear();
// Store paths to sort later
typedef vector<path> vec;
vec v;
copy(directory_iterator(p), directory_iterator(), back_inserter(v));
sort(v.begin(), v.end());
for(vec::const_iterator it(v.begin()), it_end(v.end()); it != it_end; ++it) {
conts->push_back(it->string());
}
} catch(const filesystem_error& ex) {
fprintf(stderr, "ERROR: '%s'!\n", ex.what());
return false;
}
return true;
}
This is a total shot in the dark. You've got:
How does the performance change if you instead make that:
My thought is that
std::copy
is specified to use postincrement. Andboost::filesystem::directory_iterator
is an InputIterator: it shouldn't really support postincrement.boost::filesystem::directory_iterator
may not be happy being postincremented.Without more information to go on, I'd guess that what you're dealing with is a Schlemiel the Painter's algorithm: (Original) (Wikipedia). They're incredibly easy to fall into doing string processing. Let me give you an example.
I want to read every line in a file, process each line somehow, run it through some intermediate processing. Then I want to gather up the the results, and maybe write it back to disk. Here's a way to do that. I make a single huge mistake that can be easy to miss:
The problem here which can be easy to miss at first glance, is that each time line
(1)
is run, the entire contents ofbuffer
is copied. If there are 1000 lines with 100 characters each, you end up spending time copying 100+200+300+400+....+100,000=5,050,000 byte copies to run this. Increase to 10,000 lines? 500,500,000. That paint can is getting further and further away.In this particular example, the fix is easy. Line
(1)
should read:or equivalently: (thanks Matthieu M.):
This manages to help because (usually)
std::string
won't need to make a full copy ofbuffer
to perform theappend
function, whereas in(1)
, we're insisting that a copy be made.More generally, suppose (a) the processing you're doing keeps state, (b) you reference all or most of it often, and (c) that state grows over time. Then there's a fair to good chance that you've written a Θ(n2) time algorithm, which will exibit exactly the type of behavior you're talking about.
Edit
Of course, the stock answer to "why is my code slow?" is "run a profile." There are a number of tools and techniques for doing this. Some options include:
They've all got their strengths. "Random Pausing" is probably the simplest to implement, though it can be hard to interpret the results. 'gprof' and 'gcov' are basically useless on multithreaded programs. Callgrind is thorough but slow, and can sometimes play strange tricks on multithreaded programs. oprofile is fast, plays nicely with multithreaded programs, but can be difficult to use, and can miss things.
However, if you're trying to profile a single threaded program, and are developing with the GNU toolchain, gprof can be a wonderful option. Take my proc.cpp, above. For purposes of demonstration, I'm going to profile an unoptimized run. First, I rebuild my program for profiling (adding
-pg
to the compile and link steps):I run the program once to create profiling information:
In addition to doing whatever it would normally do, this run will create a file called 'gmon.out' in the current directory. Now, I run gprof to interpret the result:
Yes indeed, 100.5% of my program's time is spent in
std::string operator+
. Well, ok, up to some sampling error. (I'm running this in a VM ... it seems that the timing being captured by gprof is off. My program took much longer than 0.01 cumulative seconds to run...)For my very simple example, gcov is a little less instructive. But here's what it happens to show. First, compile and run for gcov:
This creates a bunch of files ending in
.gcno
,.gcda
,.gcov
in the current directory. The files in.gcov
tell us how many times each line of code was executed during the run. So, in my example, myproc.cpp.gcov
ends up looking like this:So from this, I'm going to have to conclude that the std::string::operator+ at line 23 (which is executed 234,937 times) is a potential cause of my program's slowness.
As an aside, callgrind/kcachegrind work with multithreaded programs, and can provide much, much more information. For this program I run:
And I find the following output, showing that what's really eating up my cycles is lots and lots of memory copies (99.4% of execution time spent in
__memcpy_ssse3_back
), which I can see all happen somewhere below line 23 in my source:Analyze your code with callgrind, part of the valgrind suite. You can graphically browse the results with kcachegrind. (Despite its name, it works on callgrind output too.) It's free and will give you awesome detail.
You can also externally turn data collection off and on. So start with it off, wait until your program gets slow, turn it on during the problem time, then turn it off. You'll see where the CPU was going. If necessary, do the same thing in reverse watching only when it's fast and compare.
Usually, the problem will stick out like a sore thumb.
Could you share your program ?
e.g. using lists to hold a million elements would be extremely slow to traverse/ search (O(n)) as opposed to say using a binary search tree (nlog(n)) or a hashing (O(1)).
2. You should look at whether you are holding on to the data at the end of each cycle (/burn/run) . Ideally you should release all the resources at the end of each run.
3. Sounds like there may be a handle leak ?