“pure virtual function called” on gcc 4.4 but not

2019-02-12 08:37发布

问题:

I've got an MCVE which, on some of my machines crashes when compiled with g++ version 4.4.7 but does work with clang++ version 3.4.2 and g++ version 6.3.

I'd like some help to know if it comes from undefined behavior or from an actual bug of this ancient version of gcc.

Code

#include <cstdlib>

class BaseType
{
public:
    BaseType() : _present( false ) {}
    virtual ~BaseType() {}

    virtual void clear() {}

    virtual void setString(const char* value, const char* fieldName)
    {
        _present = (*value != '\0');
    }

protected:
    virtual void setStrNoCheck(const char* value) = 0;

protected:
    bool _present;
};

// ----------------------------------------------------------------------------------

class TypeTextFix : public BaseType
{
public:
    virtual void clear() {}

    virtual void setString(const char* value, const char* fieldName)
    {
        clear();
        BaseType::setString(value, fieldName);
        if( _present == false ) {
            return; // commenting this return fix the crash. Yes it does!
        }
        setStrNoCheck(value);
    }

protected:
    virtual void setStrNoCheck(const char* value) {}
};

// ----------------------------------------------------------------------------------

struct Wrapper
{
    TypeTextFix _text;
};

int main()
{
    {
        Wrapper wrapped;
        wrapped._text.setString("123456789012", NULL);
    }
    // if I add a write to stdout here, it does not crash oO
    {
        Wrapper wrapped;
        wrapped._text.setString("123456789012", NULL); // without this line (or any one), the program runs just fine!
    }
}

Compile & run

g++ -O1 -Wall -Werror thebug.cpp && ./a.out
pure virtual method called
terminate called without an active exception
Aborted (core dumped)

This is actually minimal, if one removes any feature of this code, it runs correctly.

Analyse

The code snippet works fine when compiled with -O0, BUT it still works fine when compiled with -O0 +flag for every flag of -O1 as defined on GnuCC documentation.

A core dump is generated from which one can extract the backtrace:

(gdb) bt
#0  0x0000003f93e32625 in raise () from /lib64/libc.so.6
#1  0x0000003f93e33e05 in abort () from /lib64/libc.so.6
#2  0x0000003f98ebea7d in __gnu_cxx::__verbose_terminate_handler() () from /usr/lib64/libstdc++.so.6
#3  0x0000003f98ebcbd6 in ?? () from /usr/lib64/libstdc++.so.6
#4  0x0000003f98ebcc03 in std::terminate() () from /usr/lib64/libstdc++.so.6
#5  0x0000003f98ebd55f in __cxa_pure_virtual () from /usr/lib64/libstdc++.so.6
#6  0x00000000004007b6 in main ()

Feel free to ask for tests or details in the comments. Asked:

  • Is it the actual code? Yes! it is! byte for byte. I've checked and rechecked.

  • What exact version of GnuCC du you use?

    $ g++ --version
    g++ (GCC) 4.4.7 20120313 (Red Hat 4.4.7-16)
    Copyright (C) 2010 Free Software Foundation, Inc.
    This is free software; see the source for copying conditions.  There is NO
    warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
    
  • Can we see the generated assembly? Yes, here it is on pastebin.com

回答1:

This is a Red Hat-specific bug not present in FSF GCC. It is not a problem in your code.

On a system with both CentOS 6's GCC, and FSF GCC 4.4.7, having both generate an assembly listing and viewing the differences between the two, one bit jumps out:

CentOS 6's GCC generates

movq $_ZTV8BaseType+16, (%rsp)

whereas FSF GCC 4.4.7 generates

movq $_ZTV11TypeTextFix+16, (%rsp)

In other words, one of Red Hat's GCC patches makes it set up the vtable incorrectly. This is part of your main function, you can see it in your own assembly listing shortly after .L48:.

Red Hat applies many patches to its version of GCC, and some of them are patches that affect code generation. Unfortunately, one of them appears to have an unintended side effect.



回答2:

Though the true solution to this bug would be not to use RedHat GnuCC 4.4.7 (or any RedHat compiler...), we are temporarily stuck with this version.

We did find an alternative: obfuscate the constructor of BaseType to the compiler hence preventing it to over-optimize it. We did it simply by defining BaseType::BaseType() in a separate translation unit.

Doing so bypass g++ bug. We did indeed checked that both BaseType and TypeTextFix virtual table pointers were written to constructed object before calling its related constructors.