Problem:
The firmware image generated when linking with a static library is different to the firmware image generated when linking with the objects directly extracted from the static library.
Both firmware images link without error and load successfully onto the microcontroller.
The latter binary (linked with objects) executes successfully and as expected, while the former (linked to the static library) does not.
The only warnings during compilation are unused-but-set-variable
in the manufacturer-supplied HAL, which due to various macro definitions are not necessary for the compiled implementation; and unused-parameter
in various weak functions, also within the manufacturer-supplied HAL.
Description:
I am developing an embedded application for the STM32F407. Until now I have been working with one code base including the microprocessor's HAL & setup code, a driver for a specific peripheral, and an application utilizing the former two.
Since I wish to develop multiple applications using the same driver & HAL (both are complete and tested, so won't change often), I wish to compile & distribute the HAL and driver as a static library, which can then be linked with the application source.
The problem is that when linking the application and static library, the firmware image does not execute correctly on the microprocessor. When linking the application and the object files directly extracted from the static library, the firmware image executes as expected.
Specifically:
Created binary does not work when linking with static library using:
$(CC) $(CFLAGS) $(LDFLAGS) -o $@ $(APPOBJECTS) Library/libtest.a
Created binary works when linking with objects extracted from static library using:
@cd Library && $(AR) x libtest.a && cd ..
$(CC) $(CFLAGS) $(LDFLAGS) -o $@ $(APPOBJECTS) Library/*.o
In both cases:
CFLAGS = $(INCLUDES) $(DEFS) -ggdb3 -O0 -std=c99 -Wall -specs=nano.specs -nodefaultlibs
CFLAGS+= -fdata-sections -ffunction-sections -mcpu=cortex-m4 -march=armv7e-m -mthumb
CFLAGS+= -mfloat-abi=hard -mfpu=fpv4-sp-d16 -MD -MP -MF $@.d
LDFLAGS = -T$(LDSCRIPT) -Wl,-static -Wl,-Map=$(@:.elf=.map),--cref -Wl,--gc-sections
I have compared the outputs of -Wl,--print-gc-sections
as well as the app.map
file, but enough is different between the two builds that no one thing jumps out as being wrong. I have also tried without -Wl,--gc-sections
, to no avail.
The output of arm-none-eabi-size
of the two firmware images is:
text data bss dec hex filename
43464 76 8568 52108 cb8c workingapp.elf
text data bss dec hex filename
17716 44 8568 26328 66d8 brokenapp.elf
A similar size discrepancy can be seen when compiling without -Wl,--gc-sections
Using arm-none-eabi-gdb
to debug the microcontroller's execution, the faulty firmware image enters an infinite loop when the WWDG interrupt occurs. This interrupt is not enabled in the firmware and thus the interrupt handler defaults to the Default_Handler
(an infinite loop). This interrupt does not occur when running the working firmware image.
The WWDG interrupt occurring is actually a red herring, as described in the accepted answer
--Mike
Summary:
The issue was that not all objects from the static library were being included in the firmware image. This is solved by surrounding the static library with the --whole-archive
and --no-whole-archive
linker flags:
$(CC) $(CFLAGS) $(LDFLAGS) -o $@ $(APPOBJECTS) -Wl,--whole-archive Library/libtest.a -Wl,--no-whole-archive
The issue arises because if the linker includes a library object with weak symbol definitions, it considers these symbols defined, and no longer searches for their (strong) definitions. Hence the object with strong definitions may or may not be included, depending on search order and what other symbols it defines.
Solution path:
Using arm-none-eabi-gdb
to debug, it appeared that the disabled WWDG interrupt was occurring and calling the Default_Handler
. This turned out to be a red herring... which has occured often enough that it led me to the answer via the "STM32 WWDG interrupt firing when not configured" stackoverflow post.
Upon reading this post and learning that the gdb function name reporting is often inaccurate for functions that share the same memory address, I checked the generated .map
file for the faulty firmware image and confirmed that the WWDG_IRQHandler
was located at the same memory address as the majority of IRQHandlers including the IRQHandlers for interrupts that are defined and used by the system (eg. some timer interrupts).
Furthermore, all interrupts defined in the stm32f4xx_it.o
object (which defines the IRQHandlers for interrupts used by the system, and which is included in the static library) pointed to the memory address of the Default_Handler
, and the respective IRQHandler symbols were listed as being supplied by startup_stm32f407xx.o
.
I then checked which object files were actually linked into the firmware image (perl -n -e '/libtest\.a\((.*?)\)/ && print "$1\n"' app.map | sort -u
) and found that only a subset of objects were linked.
Further inspection of startup_stm32f407xx.s
showed that it defines many weak symbols, eg:
.weak TIM2_IRQHandler
During the process of linking a static library, the linker searches the library for undefined symbols and includes the first object it finds to define these symbols. It then removes the symbol from the undefined list, as well as any other undefined symbols that are defined by the included object.
My guess as to what happened is that the linker found an otherwise-undefined symbol in startup_stm32f407xx.o
and included the object. It considered all IRQHandler symbols to be defined by the weak definitions therein. The object stm32f4xx_it.o
was never included since it did not define any undefined symbols. This happened a number of times, with a number of different object files; sometimes the strong symbols were included, sometimes the weak symbols were included, depending on which object was searched first. Interesting (yet unsurprising) is that if the weak definition is removed, the object containing the strong definition is included, and all strong definitions from that file (correctly) override the already-included weak definitions.
Having solved the problem, I'm not sure where to go from here. Is this a linker bug?
You'll get a better answer if you can explain what "the binary doesn't work" really means.
Are you getting a binary that your programming tools won't load into the chip at all?
If so, look carefully at linker output on the command line.
Are you producing something you can load into the chip and not seeing the expected behavior?
If so, use a hardware debugger. Step through the code until something breaks, or let it run, then halt it and see where you ended up.
Chances are, you're just uncovering a bug that's always been in the code by rearranging where everything goes in memory. Array overflows, bad pointer dereferences, and uninitialized variables are typical culprits. Switching on -Wextra
and -Wall
can help uncover this stuff.
One other thought: Make sure you're LDSCRIPT has the correct flash & RAM sizes for the actual part number (i.e. is not for a different part in the family).
I also work currently with that MCU. However, I avoid the ST "standard" library for good reasons.
It looks as if the watchdog has been enabled during startup and does expire soon (the interrupt is an early warning. This may be due to variations in run-time behaviour. This might very well vary depending on linkage due to trampolines created by the linker and/or tink-time optimization (LTO) and inlining by the compiler and other optimizations.
The sizes given seem to be out-of-bounds for normal variation with identical compile/link options. But they are very well possible for -Os vs. -O3 and LTO/no LTO (whereas for the latter the resulting code size may be very well larger or smaller, depending on -O). Also, I noticed some gcc/ld version have problems with LTO and all code has to be compiled&linked(!) with the same options. Also check the ABI used and that it matches the (C- and gcc-libs used.
A good start would be to coarse-step through startup from reset with a watchpoint at WWDG->CR. Also check the EWI-bit; this would actually allow the interrupt.