gcc/g++: error when compiling large file

2019-05-26 13:17发布

I have a auto-generated C++ source file, around 40 MB in size. It largely consists of push_back commands for some vectors and string constants that shall be pushed.

When I try to compile this file, g++ exits and says that it couldn't reserve enough virtual memory (around 3 GB). Googling this problem, I found that using the command line switches

--param ggc-min-expand=0 --param ggc-min-heapsize=4096

may solve the problem. They, however, only seem to work when optimization is turned on.

1) Is this really the solution that I am looking for?

2) Or is there a faster, better (compiling takes ages with these options acitvated) way to do this?

Best wishes,

Alexander

Update: Thanks for all the good ideas. I tried most of them. Using an array instead of several push_back() operations reduced memory usage, but as the file that I was trying to compile was so big, it still crashed, only later. In a way, this behaviour is really interesting, as there is not much to optimize in such a setting -- what does the GCC do behind the scenes that costs so much memory? (I compiled with deactivating all optimizations as well and got the same results)

The solution that I switched to now is reading in the original data from a binary object file that I created from the original file using objcopy. This is what I originally did not want to do, because creating the data structures in a higher-level language (in this case Perl) was more convenient than having to do this in C++.

However, getting this running under Win32 was more complicated than expected. objcopy seems to generate files in the ELF format, and it seems that some of the problems I had disappeared when I manually set the output format to pe-i386. The symbols in the object file are by standard named after the file name, e.g. converting the file inbuilt_training_data.bin would result in these two symbols: binary_inbuilt_training_data_bin_start and binary_inbuilt_training_data_bin_end. I found some tutorials on the web which claim that these symbols should be declared as extern char _binary_inbuilt_training_data_bin_start;, but this does not seem to be right -- only extern char binary_inbuilt_training_data_bin_start; worked for me.

6条回答
forever°为你锁心
2楼-- · 2019-05-26 13:33

To complement some of the answers here, you may be better off generating a binary object file and linking it directly -- as opposed to compiling files consisting of const char[]'s.

I had a similar problem working with gcc lately. (Around 60 MB of PNG data split into some 100 header files.) Including them all is the worst option: The amount of memory needed seems to grow exponentially with the size of the compilation unit.

查看更多
在下西门庆
3楼-- · 2019-05-26 13:42

If you're just generating a punch of calls to push_back() in a row, you can refactor it into something like this:

// Old code:
v.push_back("foo");
v.push_back("bar");
v.push_back("baz");

// Change that to this:
{
    static const char *stuff[] = {"foo", "bar", "baz"};
    v.insert(v.end(), stuff, stuff + ARRAYCOUNT(stuff));
}

Where ARRAYCOUNT is a macro defined as follows:

#define ARRAYCOUNT(a) (sizeof(a) / sizeof(a[0]))

The extra level of braces is just to avoid name conflicts if you have many such blocks; alternatively, you can just generate a new unique name for the stuff placeholder.

If that still doesn't work, I suggest breaking your source file up into many smaller source files. That's easy if you have many separate functions; if you have one enormous function, you'll have to work a little harder, but it's still very doable.

查看更多
Anthone
4楼-- · 2019-05-26 13:47

Can you do the same problem without generating 40 MB worth of C++? That's more than some operating systems I've used. A loop and some data files, perhaps?

查看更多
三岁会撩人
5楼-- · 2019-05-26 13:48

It sounds like your autogenerated app looks like this:

push_back(data00001);
...
push_back(data99999);

Why don't you put the data into an external file and let the program read this data in a loop?

查看更多
甜甜的少女心
6楼-- · 2019-05-26 13:51

You may be better off using a constant data table instead. For example, instead of doing this:

void f() {
    a.push_back("one");
    a.push_back("two");
    a.push_back("three");
    // ...
}

try doing this:

const char *data[] = {
    "one",
    "two",
    "three",
    // ...
};

void f() {
    for (size_t i = 0; i < sizeof(data)/sizeof(data[0]); i++) {
        a.push_back(data[i]);
    }
}

The compiler will likely be much more efficient generating a large constant data table, rather than huge functions containing many push_back() calls.

查看更多
疯言疯语
7楼-- · 2019-05-26 13:52

if you cannot refactor your code, you could try to increment amount of swap space you have, provided your operating system supports large address space. This should work for 64-bit computers, but 3 gigabytes might be too much for 32 bit system.

查看更多
登录 后发表回答