I have a auto-generated C++ source file, around 40 MB in size. It largely consists of push_back commands for some vectors and string constants that shall be pushed.
When I try to compile this file, g++ exits and says that it couldn't reserve enough virtual memory (around 3 GB). Googling this problem, I found that using the command line switches
--param ggc-min-expand=0 --param ggc-min-heapsize=4096
may solve the problem. They, however, only seem to work when optimization is turned on.
1) Is this really the solution that I am looking for?
2) Or is there a faster, better (compiling takes ages with these options acitvated) way to do this?
Best wishes,
Alexander
Update: Thanks for all the good ideas. I tried most of them. Using an array instead of several push_back() operations reduced memory usage, but as the file that I was trying to compile was so big, it still crashed, only later. In a way, this behaviour is really interesting, as there is not much to optimize in such a setting -- what does the GCC do behind the scenes that costs so much memory? (I compiled with deactivating all optimizations as well and got the same results)
The solution that I switched to now is reading in the original data from a binary object file that I created from the original file using objcopy
. This is what I originally did not want to do, because creating the data structures in a higher-level language (in this case Perl) was more convenient than having to do this in C++.
However, getting this running under Win32 was more complicated than expected. objcopy seems to generate files in the ELF format, and it seems that some of the problems I had disappeared when I manually set the output format to pe-i386
. The symbols in the object file are by standard named after the file name, e.g. converting the file inbuilt_training_data.bin
would result in these two symbols: binary_inbuilt_training_data_bin_start and binary_inbuilt_training_data_bin_end. I found some tutorials on the web which claim that these symbols should be declared as extern char _binary_inbuilt_training_data_bin_start;
, but this does not seem to be right -- only extern char binary_inbuilt_training_data_bin_start;
worked for me.
To complement some of the answers here, you may be better off generating a binary object file and linking it directly -- as opposed to compiling files consisting of
const char[]
's.I had a similar problem working with gcc lately. (Around 60 MB of PNG data split into some 100 header files.) Including them all is the worst option: The amount of memory needed seems to grow exponentially with the size of the compilation unit.
If you're just generating a punch of calls to
push_back()
in a row, you can refactor it into something like this:Where
ARRAYCOUNT
is a macro defined as follows:The extra level of braces is just to avoid name conflicts if you have many such blocks; alternatively, you can just generate a new unique name for the
stuff
placeholder.If that still doesn't work, I suggest breaking your source file up into many smaller source files. That's easy if you have many separate functions; if you have one enormous function, you'll have to work a little harder, but it's still very doable.
Can you do the same problem without generating 40 MB worth of C++? That's more than some operating systems I've used. A loop and some data files, perhaps?
It sounds like your autogenerated app looks like this:
Why don't you put the data into an external file and let the program read this data in a loop?
You may be better off using a constant data table instead. For example, instead of doing this:
try doing this:
The compiler will likely be much more efficient generating a large constant data table, rather than huge functions containing many
push_back()
calls.if you cannot refactor your code, you could try to increment amount of swap space you have, provided your operating system supports large address space. This should work for 64-bit computers, but 3 gigabytes might be too much for 32 bit system.