I have a real-time embedded app with the major cycle running at 10KHz. It runs on a TI TMS320C configured to boot from flash. I recently added an initialized array to a source file, and all of a sudden the timing is screwed up (in a way too complex to explain well - essentially a serial port write is no longer completing on time.)
The things about this that baffle me:
- I'm not even accessing the new data, just declaring an initialized array.
- It is size dependant - the problem only appears if the array is >40 words.
- I know I'm not overflowing any data segments in the link map.
- There is no data caching, so it's not due to disrupting cache consistency.
Any ideas on how simply increasing the size of the .cinit segment in flash can affect the timing of your code?
Additional Info:
I considered that maybe the code had moved, but it is well-separated from the data. I verified through the memory map that all the code segements have the same addresses before and after the bug. I also verified that none of the segments are full - The only addresses that change in the map are a handful in the .cinit section. That section contains data values used to initialize variables in ram (like my array). It shouldn't ever be accessed after main() is called.
Could be a bank or page conflict in as well. Maybe you have two routines that are called quite often (interrupt handlers or so) that have been at the same page and are now split up in two pages.
My suspicions would point to a change in alignment between your data/code and the underlying media/memory. Adding to your data would change the locations of memory in your heap (depending on the memory model) and might put your code across a 'page' boundary on the flash device causing latency which was not there before.
After more than a day staring at traces and generated assembly, I think I figured it out. The root cause problem turned out to be an design issue that caused glitches only if the ISR that kicked off the serial port write collided with a higher priority one. The timing just happened to work out that it only took adding a few extra instructions to one loop to cause the two interrupts to collide.
So the question becomes: How does storing, but not accessing, additional data in flash memory cause additional instructions to be executed?
It appears that the answer is related to, but not quite the same as, the suggestions by Frosty and Frederico. The new array does move some existing variables, but not across page boundaries or to slower regions (on this board, access times should be the same for all regions) . But it does change the offsets of some frequently accessed structures, which causes the optimizer to issue slightly different instruction sequences for accessing them. One data alignment may cause a one cycle pipeline stall, where the other does not. And those few instructions shifted timing enough enough to expose the underlying problem.
I would risk my self and claim that you don't have a performance problem here, rather some kind of memory corruption that symptoms as a performance problem. Adding an array to your executable changing the memory picture. So my guess will be that you have a memory corruption that mostly harmless (i.e overwriting not used part of the memory) and shifting your memory more than 40 bytes cause the memory corruption to make a bigger problem. Which one is a real questions
Perhaps the new statically allocated array pushes existing data into slower memory regions, causing accesses to that data to be slower?
Does the problem recur if the array is the last thing in its chunk of address space? If not, looking in your map, try moving the array declaration so that, one by one, things placed after it are shuffled to be before it instead. This way you can pinpoint the relevant object and begin to work out why moving it causes the delay.