ARM M4 Instructions per Cycle (IPC) counters

2020-05-19 07:40发布

问题:

I would like to count the number of Instructions per Cycle executed on an ARM cortex-M4 (or cortex-M3) processor.

What it's needed is: number of instructions (executed at runtime) of the code I want to profile and number of cycles that the code takes to execute.

1 - Number of Cycles

Use the cycle counter is quite easy and straightforward.

volatile unsigned int *DWT_CYCCNT  ;
volatile unsigned int *DWT_CONTROL ;
volatile unsigned int *SCB_DEMCR   ;

void reset_timer(){
    DWT_CYCCNT   = (int *)0xE0001004; //address of the register
    DWT_CONTROL  = (int *)0xE0001000; //address of the register
    SCB_DEMCR    = (int *)0xE000EDFC; //address of the register
    *SCB_DEMCR   = *SCB_DEMCR | 0x01000000;
    *DWT_CYCCNT  = 0; // reset the counter
    *DWT_CONTROL = 0; 
}

void start_timer(){
    *DWT_CONTROL = *DWT_CONTROL | 1 ; // enable the counter
}

void stop_timer(){
    *DWT_CONTROL = *DWT_CONTROL | 0 ; // disable the counter    
}

unsigned int getCycles(){
    return *DWT_CYCCNT;
}

main(){
    ....
    reset_timer(); //reset timer
    start_timer(); //start timer
    //Code to profile
    ...
    myFunction();
    ...
    stop_timer(); //stop timer
    numCycles = getCycles(); //read number of cycles 
    ...
}

2 - Number of Instructions

I found some documentation surfing the internet to count the number of instructions executed by the arm cortex-M3 and cortex-M4 (link):

  # instructions = CYCCNT - CPICNT - EXCCNT - SLEEPCNT - LSUCNT + FOLDCNT

The registers that they mention are documented here (from page 11-13) and these are the memory addresses to access them:

DWT_CYCCNT   = 0xE0001004
DWT_CONTROL  = 0xE0001000
SCB_DEMCR    = 0xE000EDFC
DWT_CPICNT   = 0xE0001008
DWT_EXCCNT   = 0xE000100C
DWT_SLEEPCNT = 0xE0001010
DWT_LSUCNT   = 0xE0001014
DWT_FOLDCNT  = 0xE0001018

The DWT_CONTROL register is used to enable counters, especially cycle counter as documented here.

But when I tried to put all together to count the number of instructions executed per cycle I didn't succeed.

Here there is a small guide on how to use them from gdb.

What is not easy is that some registers are 8 bit registers (DWT_CPICNT, DWT_EXCCNT, DWT_SLEEPCNT, DWT_LSUCNT, DWT_FOLDCNT) and when they overflow they trigger an event. I didn't find a way to collect that event. There are no code snippet that explains how to do that or interrupt routines suitable for that.

It seems moreover that using watchpoints from gdb on the addresses of those registers doesn't work. gdb is not able to stop when registers change value. E.g. on DWT_LSUCNT:

(gdb) watch *0xE0001014

Update: I found this project on GitHub explaining how to use DWT, ITM and ETM units. But I didn't check if it works! I will post updates.

Any idea on how to use them?

Thank you!

回答1:

The code sample you provided has a problem in clearing the enable bit. You should clear the bit sing 'AND' not 'OR':

*DWT_CONTROL = *DWT_CONTROL & 0xFFFFFFFE ; // disable the counter by clearing the enable bit


回答2:

I think if you want to measure accuracy cycles, using debugger is a good choice. the Keil-MDK could accumulate the state register and will not overflow. the result in debugger is the same as the result using DWT.

if you want to measure the other values ie FOLDCNT, using trace in Keil-MDK -> Debug -> Setting -> Trace -> Trace Enable.

With that, while debugging, in the Trace Windows choose trace event, the value of those 8 bits register could be collected and added together by Keil.

It seems a little stupid but I don't know how to collect the event of overflow, I think this event could only be send to ITM, because either the DWT or the ITM is individual component out of the program. if we want to collect the event in customer program, the collect action will must effect the accuracy of the result.

ITM? ETM? CoreSight? DWT?AHB?



回答3:

I have no idea how to use the registers the way you want to use them. But, here is how I deal with measuring cycles.

Make sure you enable the counter at the SysTick Control and Status Register. With the appropriate headers, you should have access to the SysTick registers as a structure.

Measure the number of cycles taken by the counter function. This is later subtracted from any measurements.

  SysTick->VAL = 0; // set 0
  // Measure delay on measurement  
  __disable_irq();
  a = (uint32_t) SysTick->VAL;
  //... measuring zero instructions
  b = (uint32_t) SysTick->VAL;
  __enable_irq();
  measure_delay = a - b;

Now measure a function.

SysTick->VAL = 0;
__disable_irq();
a = (uint32_t) SysTick->VAL;

//Assuming this function doesn't require interruptions

// INSERT CODE TO BE PROFILED
function_to_be_examined();

b = (uint32_t) SysTick->VAL;
__enable_irq();
cycles_profiled_code = a - b - measure_delay;

I hope it helps.