I'm currently updating a C++ library for Arduino (Specifically 8-bit AVR processors compiled using avr-gcc).
Typically the authors of the default Arduino libraries like to include an extern variable for the class inside the header, which is defined in the class .cpp file also. This I assume is basically to have everything provided ready to go for newbies as built-in objects.
The scenario I have is: The library I have updated no longer requires the .cpp file and I have removed it from the library. It wasn't until I went on a final pass checking for bugs that I realized, no linker error was produced despite the fact a definition wasn't provided for the extern
variable in a .cpp file.
This is as simple as I can get it (header file):
struct Foo{
void method() {}
};
extern Foo foo;
Including this code and using it in one or many source files does not cause any linker error. I have tried it in both versions of GCC which Arduino uses (4.3.7, 4.8.1) and with C++11 enabled/disabled.
In my attempt to cause an error, I found it was only possible when doing something like taking the address of the object or modifying the contents of a dummy variable I added.
After discovering this I find its important to note:
- The class functions only return other objects, as in, nothing like operators returning references to itself, or even a copy.
- It only modifies external objects (registers which are effectively
volatile uint8_t
references in code), and returns temporaries of other classes. - All of the class functions in this header are so basic that they cost less than or equal to the cost of a function call, therefore they are (in my tests) completely in-lined into the caller. A typical statement may create many temporary objects in the call chain, however the compiler sees through these and outputs efficient code modifying registers directly, rather than a set of nested function calls.
I also recall reading in n3797 7.1.1 - 8 that extern
can be used on incomplete types, however the class is fully defined whereas the declaration is not (this is probably irrelevant).
I'm led to believe that this may be a result of optimizations at play. I have seen the effect that taking the address has on objects which would otherwise be considered constant and compiled without RAM usage. By adding any layer of indirection to an object in which the compiler cannot guarantee state will cause this RAM consuming behavior.
So, maybe I've answered my question by simply asking it, however I'm still making assumptions and it bothers me. After quite some time hobby-coding C++, literally the only thing on my list of do-not's is making assumptions.
Really, what I want to know is:
- With respect to the working solution I have, is it a simple case of documenting the inability to take the address (cause indirection) of the class?
- Is it just an edge case behavior caused by optimizations eliminating the need for something to be linked?
- Or is plain and simple undefined behavior. As in GCC may have a bug and is permitting code that might fail if optimizations were lowered or disabled?
Or one of you may be lucky enough to be in possession of a decoder ring that can find a suitable paragraph in the standard outlining the specifics.
This is my first question here, so let me know if you would like to know certain details, I can also provide GitHub links to the code if needed.
Edit: As the library needs to be compatible with existing code I need to maintain the ability to use the dot syntax, otherwise I'd simply have a class of static functions.
To remove assumptions for now, I see two options:
- Add a .cpp just for the variable declaration.
- Use a define in the header like
#define foo (Foo())
allowing dot syntax via a temporary.
I prefer the method using a define, what does the community think?
Cheers.
Declaring something
extern
just informs the assembler and the linker that whenever you use that label/symbol, it should refer to entry in the symbol table, instead of a locally allocated symbol.The role of the linker is to replace symbol table entries with an actual reference to the address space whenever possible.
If you don't use the symbol at all in your C file, it will not show up in the assembly code, and thus will not cause any linker error when your module is linked with others, since there is no undefined reference.
It is either an edge case behaviour caused by optimization, or you never use the
foo
variable in your code. I'm not 100% sure it is formally not an undefined behavior, but i'm quite sure it isn't undefined from practical point of view.extern
variables are implemented in such way, that code compiled with them produces so-called relocations - empty places where addres of variable should be placed - which are then filled by linker. Apparentlyfoo
is never used in your code in such a way that would need getting it's address and therefore linker doesn't even try to find that symbol. If you turn optimization off (-O0) you will probably get linker error.Update: If you want to keep "dot notation" but remove the problem with undefined extern, you may replace
extern
withstatic
(in header file), creating separate "instance" of variable for each TU. As this variable is going to be optimized out anyway, this will not change the real code at all, but will also work for unoptimized build.