I've used PICs before and now I'm using with STM32F415. On a time-critical part of my code I need to put a very exact delay to adjust the period of the DAC-DMA that are working together to create a periodic analog signal.
The delay I want to add goes from 0 to 63 clock cycles (If I were able to do 10-63 clock cycles it would be OK aswell). In PIC24F assembly, there's the instruction "REPEAT" which allows me to repeat the next instruction a certain number of times. That would work great for me as I'd be able to just do:
REPEAT #0xNUMBER
NOP
I'm trying to find something similar with the STM32F4, but I had no luck searching in the instruction set, Reference Manual, and on the Internet in general.
I've already tried to use for/while loops in C and a timer dedicated to it, but the extra instructiosn required consume too much time (40-50 cycles depending on the way I program it).
If someone has an idea or knows how to do it, It would be very useful for me.
Thanks a lot.
English is not my mother-tongue language so I'm sorry for any possible mistakes. Let me know and I'll try to improve it :)
EDIT 1 (23-jul-17)
Thanks to everyone answering, I've been very busy and couldn't answer every one of you individually. I'll try @berendi solution of gated clocks, it seems as the best fit for my application. I'm learning a lot of things about the STM32 I didn't know, thank you everyone!
NOP is not a very good solution for the delay. Use barrier instructions instead as the execution time is exactly as stated in the ARM documentation (3, 4 or 5 cycles depending what instruction and the core version). You can place n consecutive barriers to archive the delay you need
So, if I'm understanding it correctly, you have Timer A controlling your DAC, triggering a conversion at each counter overflow, and you'd like to delay it for a variable number of clock cycles.
Most (if not all) timers of the STM32F4 support gated mode slave operation, where you can select another timer (Timer B) as a master, and Timer A counts only as long as the trigger output of Timer B is low. In other words, Timer A will stop counting on a rising edge from Timer B, and resume counting on a falling edge. Now, configure Timer B to output a single pulse when enabled, where the pulse width is the delay you want, then Timer A will be delayed for the exact duration of the pulse.
See the chapters on One-pulse mode, Timers and external trigger synchronization, and the description of the CR1, CR2, and SMCR registers in the reference manual.
On a PIC you can do this and it is a very common solution, the execution time was deterministic. Outside architectures like that, and older chips that were also deterministic (before clones came out) that would be okay as well. But in general that is not how you do a delay, it is not deterministic, you can get an "at least this long" for a tuned loop, but you cant get "exactly this long" even tuned, or should never expect to. That is why there are timers, multiple usually, in mcu designs and that is what you use for measuring time. For the problem you are trying to solve that is the solution here, one timer or cascaded timers if you really need that.
Arm does not have an x86 like repeat instruction your smallest loop is going to be two instructions and I have countless times demonstrated that on the same chip this loop can vary in speed, so tune it, add a line of code and the delay properties of this loop change
for classic (gas) syntax, for unified syntax use subs instead of sub.
You are also on an stm32 where they have a buried cache on the instruction side that you cannot turn off nor control, it generally gives you no wait state performance, certainly for things like this but obviously they dont have a cache the size of the flash so pre-fetch cycles have to happen somewhere, and you have to expect that sometimes you are going to have to feel that prefetch when you jump into this loop.