What i mean is that given a source code file is it possible to extract energy consumption levels for a particular code block or 1 single instruction, using a tool like perf?
相关问题
- Is shmid returned by shmget() unique across proces
- how to get running process information in java?
- Error building gcc 4.8.3 from source: libstdc++.so
- Why should we check WIFEXITED after wait in order
- Null-terminated string, opening file for reading
Use jRAPL which is a framework for profiling Java programs running on CPUs.
For example, the following code snippet attempts to measure the energy consumption of any code block, whose value is the difference between beginning and end:
And the detailed paper of this framework titled "Data-Oriented Characterization of Application-Level Energy Optimization" is in http://gustavopinto.org/lost+found/fase2015.pdf
There are tools for measuring power consumption (see @jww's comment for links), but they don't even try to attribute consumption to specific instructions the way
perf record
can statistically sample event -> instruction correlations.You can get an idea by running a whole block of the same instruction, like you'd do when trying to microbenchmark the throughput or latency of an instruction. Divide energy consumed by number of instructions executed.
But a significant fraction of CPU power consumption is outside of the execution units, especially for out-of-order CPUs running relatively cheap instructions (like scalar ADD / AND, or different memory subsystem behaviour triggered by different, like hardware prefetching).
Different patterns of data dependencies and latencies might matter. (Or maybe not, maybe out-of-order schedulers tend to be constant power regardless of how many instructions are waiting for their inputs to be ready, and setting up bypass forwarding vs. reading from the register file might not be significant.)
So a power or energy-per-instruction number is not directly meaningful, mostly only relative to a long block of dependent
AND
instructions or something. (Should be one of the lowest-power instructions, probably fewer transistors flipping inside the ALU than with ADD.) That's a good baseline for power microbenchmarks that run 1 instruction or uop per clock, but maybe not a good baseline for power microbenches where the front-end is doing more or less work.You might want to investigate how dependent AND vs. independent NOP or AND instructions affect energy per time or energy per instruction. (i.e. how does power outside the execution units scale with instructions-per-clock and/or register read / write-back.)