The GNU ld (linker script) manual Section 3.5.5 Source Code Reference has some really important information on how to access linker script "variables" (which are actually just integer addresses) in C source code. I used this info. to extensively use linker script variables, and I wrote this answer here: How to get value of variable defined in ld linker script from C.
However, it is easy to do it wrong and make the mistake of trying to access a linker script variable's value (mistakenly) instead of its address, since this is a bit esoteric. The manual (link above) says:
This means that you cannot access the value of a linker script defined symbol - it has no value - all you can do is access the address of a linker script defined symbol.
Hence when you are using a linker script defined symbol in source code you should always take the address of the symbol, and never attempt to use its value.
The question: So, if you do attempt to access a linker script variable's value, is this "undefined behavior"?
Quick refresher:
Imagine in linker script (ex: STM32F103RBTx_FLASH.ld) you have:
/* Specify the memory areas */
MEMORY
{
FLASH (rx) : ORIGIN = 0x8000000, LENGTH = 128K
RAM (xrw) : ORIGIN = 0x20000000, LENGTH = 20K
}
/* Some custom variables (addresses) I intend to access from my C source code */
__flash_start__ = ORIGIN(FLASH);
__flash_end__ = ORIGIN(FLASH) + LENGTH(FLASH);
__ram_start__ = ORIGIN(RAM);
__ram_end__ = ORIGIIN(RAM) + LENGTH(RAM);
And in your C source code you do:
// 1. correct way A:
extern uint32_t __flash_start__;
printf("__flash_start__ addr = 0x%lX\n", (uint32_t)&__flash_start__);
// OR 2. correct way B (my preferred approach):
extern uint32_t __flash_start__[]; // not a true array; [] is required to access linker script variables (addresses) as though they were normal variables
printf("__flash_start__ addr = 0x%lX\n", (uint32_t)__flash_start__);
// OR 3. COMPLETELY WRONG WAY TO DO IT!
// - IS THIS UNDEFINED BEHAVIOR?
extern uint32_t __flash_start__;
printf("__flash_start__ addr = 0x%lX\n", __flash_start__);
Sample printed output
(this is real output: it was actually compiled, run, and printed by an STM32 mcu):
__flash_start__ addr = 0x8000000
__flash_start__ addr = 0x8000000
__flash_start__ addr = 0x20080000
<== NOTICE LIKE I SAID ABOVE: this one is completely wrong (even though it compiles and runs)!
Update:
Response to @Eric Postpischil's 1st comment:
The C standard does not define anything at all about linker script symbols. Any specification of behavior is up to the GNU tools. That said, if a linker script symbol identifies a place in memory where some valid object is stored, I would expect accessing the value of that object to work, if it were accessed with its proper type. Supposing flash_start is normally accessible memory, and except for any requirements of your system about what is at flash_start, you could, in theory, put a uint32_t (using appropriate input to the linker) and then access it via flash_start.
Yes, but that's not my question. I'm not sure if you're picking up the subtlety of my question. Take a look at the examples I provide. It is true you can access this location just fine, but make sure you understand how you do so, and then my question will become apparent. Look especially at example 3 above, which is wrong even though to a C programmer it looks right. To read a uint32_t
, for ex, at __flash_start__
, you'd do this:
extern uint32_t __flash_start__;
uint32_t u32 = *((uint32_t *)&__flash_start__); // correct, even though it *looks like* you're taking the address (&) of an address (__flash_start__)
OR this:
extern uint32_t __flash_start__[];
uint32_t u32 = *((uint32_t *)__flash_start__); // also correct, and my preferred way of doing it because it looks more correct to the trained "C-programmer" eye
But most definitely NOT this:
extern uint32_t __flash_start__;
uint32_t u32 = __flash_start__; // incorrect; <==UPDATE: THIS IS ALSO CORRECT! (and more straight-forward too, actually; see comment discussion under this question)
and NOT this:
extern uint32_t __flash_start__;
uint32_t u32 = *((uint32_t *)__flash_start__); // incorrect, but *looks* right
I said in the question:
(See discussion under the question for how I came to this).
Looking specifically at #3 above:
Well, actually, if your goal is to read the address of
__flash_start__
, which is0x8000000
in this case, then yes, this is completely wrong. But, it is NOT undefined behavior! What it is actually doing, instead, is reading the contents (value) of that address (0x8000000
) as auint32_t
type. In other words, it's simply reading the first 4 bytes of the FLASH section, and interpreting them as auint32_t
. The contents (uint32_t
value at this address) just so happen to be0x20080000
in this case.To further prove this point, the following are exactly identical:
The output is:
Notice they produce the same result. They each are producing a valid
uint32_t
-type value which is stored at address0x8000000
.It just so turns out, however, that the
u32_1
technique shown above is a more straight-forward and direct way of reading the value is all, and again, is not undefined behavior. Rather, it is correctly reading the value (contents of) that address.I seem to be talking in circles. Anyway, mind blown, but I get it now. I was convinced before I was supposed to use the
u32_2
technique shown above only, but it turns out they are both just fine, and again, theu32_1
technique is clearly more straight-forward (there I go talking in circles again). :)Cheers.
Digging deeper: Where did the
0x20080000
value stored right at the start of my FLASH memory come from?One more little tidbit. I actually ran this test code on an STM32F777 mcu, which has 512KiB of RAM. Since RAM starts at address 0x20000000, this means that 0x20000000 + 512K = 0x20080000. This just so happens to also be the contents of the RAM at address zero because Programming Manual PM0253 Rev 4, pg. 42, "Figure 10. Vector table" shows that the first 4 bytes of the Vector Table contain the "Initial SP [Stack Pointer] value". See here:
I know that the Vector Table sits right at the start of the program memory, which is located in Flash, so that means that 0x20080000 is my initial stack pointer value. This makes sense, because the
Reset_Handler
is the start of the program (and its vector just so happens to be the 2nd 4-byte value at the start of the Vector Table, by the way), and the first thing it does, as shown in my "startup_stm32f777xx.s" startup assembly file, is set the stack pointer (sp) to_estack
:Furthermore,
_estack
is defined in my linker script as follows:So there you have it! The first 4-byte value in my Vector Table, right at the start of Flash, is set to be the initial stack pointer value, which is defined as
_estack
right in my linker script file, and_estack
is the address at the end of my RAM, which is 0x20000000 + 512K = 0x20080000. So, it all makes sense! I've just proven I read the right value!