Happily linking incompatible types leads to chaos

2019-05-08 15:09发布

问题:

I've been trying to figure out some boundaries of g++, especially linking (C++) object files. I found the following curiosity which I tried to compress as much as possible before asking.

Code

File common.h

#ifndef _COMMON_H
#define _COMMON_H

#include <iostream>

#define TMPL_Y(name,T) \
struct Y { \
  T y; \
  void f() { \
    std::cout << name << "::f " << y << std::endl; \
  } \
  virtual void vf() { \
    std::cout << name << "::vf " << y << std::endl; \
  } \
  Y() { \
    std::cout << name << " ctor" << std::endl; \
  } \
  ~Y() { \
    std::cout << name << " dtor" << std::endl; \
  } \
}

#define TMPL_Z(Z) \
struct Z { \
  Y* y; \
  Z(); \
  void g(); \
}

#define TMPL_Z_impl(name,Z) \
Z::Z() { \
  y = new Y(); \
  y->y = name; \
  std::cout << #Z << "(); sizeof(Y) = " << sizeof(Y) << std::endl; \
} \
void Z::g() { \
  y->f(); \
  y->vf(); \
}

#endif

File a.cpp compiled with g++ -Wall -c a.cpp

#include "common.h"

TMPL_Y('a',char);

TMPL_Z(Za);

TMPL_Z_impl('a',Za);

File b.cpp compiled with g++ -Wall -c b.cpp

#include "common.h"

TMPL_Y('b',unsigned long long);

TMPL_Z(Zb);

TMPL_Z_impl('b',Zb);

File main.cpp compiled and linked with g++ -Wall a.o b.o main.cpp

#include "common.h"

struct Y;
TMPL_Z(Za);
TMPL_Z(Zb);

int main() {
  Za za;
  Zb zb;
  za.g();
  zb.g();
  za.y = zb.y;
  return 0;
}

The result of ./a.out is

a ctor
Za(); sizeof(Y) = 8
a ctor  // <- mayhem
Zb(); sizeof(Y) = 12
a::f a
a::vf a
a::f b  // <- mayhem
a::vf b // <- mayhem

Question

Now, I would have expected g++ to call me some nasty names for trying to link a.o and b.o together. Especially the assignment of za.y = zb.y is evil. Not only that g++ does not complain at all, that I want it to link together incompatible types with the same name (Y) but it completely ignores the secondary definition in b.o (resp. b.cpp).

I mean I'm not doing something sooo far fetched. It is quite reasonable that two compilation units could use the same name for local classes, esp. in a large project.

Is this a bug? Could anybody shed some light on the issue?

回答1:

In your example, you could put the definition of Y in an anonymous namespace like this:

#define TMPL_Y(name,T) \
namespace { \
    struct Y { \
      T y; \
      void f() { \
        std::cout << name << "::f " << y << std::endl; \
      } \
      virtual void vf() { \
        std::cout << name << "::vf " << y << std::endl; \
      } \
      Y() { \
        std::cout << name << " ctor" << std::endl; \
      } \
      ~Y() { \
        std::cout << name << " dtor" << std::endl; \
      } \
    }; \
}

this essentially creates a unique namespace for each compilation unit and you have, in effect, unique Y's, and the linker will be able to associate correctly.

As for the statement

za.y = zb.y;

this will still yield unpredictable results of course as the 2 types are incompatible.



回答2:

Quoting Bjarne Stroustrup's "The C++ Programming Language":

9.2 Linkage

Names of functions, classes, templates, variables, namespaces, enumerations and enumerators must be used consistently across all translation units unless they are explicitly specified to be local.

It is the programmer's task to ensure that every namespace, class, function, etc. is properly declared in every translation unit in which it appears and that all declarations referring to the same entity are consistent. [...]



回答3:

In many cases there are errors that the C++ compiler is not required to catch. Many of them are for example errors that are impossible to detect by analyzing one translation unit at a time.

For example without making complex cases with templates if you just declare in an header file

void foo(int x);

and then you provide two distinct definitions for the function in different translation units the C++ compiler is not required to give an error at link time.

Note that this is clearly not impossible to happen by mistake because indeed there could even be two distinct headers with a global function with the same signature and part of the project using one header and part of the project using the other.

The same can happen if you declare a certain class Foo in two different header files with different declarations and with different implementations.

This abuse of naming is simply a kind of error that the compiler is not required to be able to catch.