Is it possible to extract instruction specific ene

2019-09-19 15:02发布

What i mean is that given a source code file is it possible to extract energy consumption levels for a particular code block or 1 single instruction, using a tool like perf?

2条回答
Luminary・发光体
2楼-- · 2019-09-19 15:32

Use jRAPL which is a framework for profiling Java programs running on CPUs.

For example, the following code snippet attempts to measure the energy consumption of any code block, whose value is the difference between beginning and end:

double beginning = EnergyCheck.statCheck();
doWork();
double end = EnergyCheck.statCheck();
System.out.println(end - beginning);

And the detailed paper of this framework titled "Data-Oriented Characterization of Application-Level Energy Optimization" is in http://gustavopinto.org/lost+found/fase2015.pdf

查看更多
放我归山
3楼-- · 2019-09-19 15:35

There are tools for measuring power consumption (see @jww's comment for links), but they don't even try to attribute consumption to specific instructions the way perf record can statistically sample event -> instruction correlations.

You can get an idea by running a whole block of the same instruction, like you'd do when trying to microbenchmark the throughput or latency of an instruction. Divide energy consumed by number of instructions executed.

But a significant fraction of CPU power consumption is outside of the execution units, especially for out-of-order CPUs running relatively cheap instructions (like scalar ADD / AND, or different memory subsystem behaviour triggered by different, like hardware prefetching).

Different patterns of data dependencies and latencies might matter. (Or maybe not, maybe out-of-order schedulers tend to be constant power regardless of how many instructions are waiting for their inputs to be ready, and setting up bypass forwarding vs. reading from the register file might not be significant.)

So a power or energy-per-instruction number is not directly meaningful, mostly only relative to a long block of dependent AND instructions or something. (Should be one of the lowest-power instructions, probably fewer transistors flipping inside the ALU than with ADD.) That's a good baseline for power microbenchmarks that run 1 instruction or uop per clock, but maybe not a good baseline for power microbenches where the front-end is doing more or less work.

You might want to investigate how dependent AND vs. independent NOP or AND instructions affect energy per time or energy per instruction. (i.e. how does power outside the execution units scale with instructions-per-clock and/or register read / write-back.)

查看更多
登录 后发表回答