I have a strategic question to the use of simultaneously opened fstream
s.
I have to write a program which has too read a large amount of files. In each file there are information to a bunch of identifiers, but one time. I have to compute this information and than save it for each identifier in a separate file. Every identifier appears in several files and should be saved every time in the same file (One identifier with many times).
I expect some hundred identifiers so I doubt I should have several hundred filesteams open simultaneously.
So is there a limitation of simultaneous filestreams?
Or do you propose another way of doing this?
The program will compute a massive amount of data (about 10GB or larger) and perhaps computes several hours.
Thanks
There's ultimately a limit to anything. Files are a perfect example of something managed by the operating system, and you will have to consult your OS documentation for the specific limit. In Linux, I believe it is configurable in the kernel. There may additionally be user and process quotas.
I don't think 200 is too many to ask.
It's quite simple to try and see. Just write a program that keeps opening more files until you get an error.
Live example.
On Mac OS X 10.8, this program
#include <iostream>
#include <fstream>
#include <iomanip>
#include <string>
int main() {
int i = 0;
std::ofstream *f;
do {
f = new std::ofstream( std::to_string( i ++ ) );
} while ( * f << "hello" << std::flush );
-- i; // Don't count last iteration, which failed to open anything.
std::cout << i << '\n';
}
Produces the output 253
. So if you're on a Mac, you're golden :) .
The C++ standard does not define a limit for how many (or how few, I believe, but I haven't looked) files you can have open at the same time.
A particular implementaton of a C++ library may have a limit (which may or may not be documented). The operating system will most likely have some limit for the whole system, and another limit per process. What those limits are will vary, so there's no easy way to tell. And they may also be artificially lowered by various settings that the system owner configures.
And even if you know what all those limits are, there could be dynamic limits that vary depending on the circumstances - for example, if the whole system allows 16384 files open, the per process limit is 1000, and the C++ library allows 1024, you may not be able to open a single file, because there is no memory available for the OS to allocate some critical block of data.
There is no limit on the fstreams you can open simultaneously, however, your os limits the number of files that can be opened at the same time. Althought some hundreds files doesn't seem to be too much for a general os, I would suggest you to read all the information beforehand (possibly opening several files at a time, but considering the possibility of a call to "open" to fail, in which case you should try again after closing some of the previously opened files) then do the processing and store the results on some internal data structure. Finally you can write the results back to the files, again in a parallel fashion but, again, being prepared to a failed attempt to open a file.