Analyzing Code for Efficiency?

2楼-- · 2019-01-01 16:05

This is called profiling. There are lots of off-the-shelf tools available to help you determine where the bottlenecks in your applications are, for all kinds of different languages. For example, the TPTP set of tools for Java can show you where performance bottlenecks are down to the individual method level, if you want it to. Of course, sometimes all you really need is a couple of reads of the system timer to get a general idea about a section of code.

0人赞添加讨论(0) 举报

看淡一切

3楼-- · 2019-01-01 16:05

I use Valgrind and its tool Callgrind. It's great tool. Valgrind is basically an Virtual Machine:

Valgrind is in essence a virtual machine using just-in-time (JIT) compilation techniques, including dynamic recompilation. Nothing from the original program ever gets run directly on the host processor. Instead, Valgrind first translates the program into a temporary, simpler form called Intermediate Representation (IR), which is a processor-neutral, SSA-based form. After the conversion, a tool (see below) is free to do whatever transformations it would like on the IR, before Valgrind translates the IR back into machine code and lets the host processor run it. Even though it could use dynamic translation (that is, the host and target processors are from different architectures), it doesn't. Valgrind recompiles binary code to run on host and target (or simulated) CPUs of the same architecture.

Callgrind is a profiler build upon that. Main benefit is that you don't have to run your aplication for hours to get reliable result. Few seconds are sufficient, because Callgrind is a non-probing profiler,

Another tool build upon Valgrind is Massif. I use it to profile heap memory usage. It works great, it gives you snapshots of memory usage -- detailed information WHAT holds WHAT percentage of memory, and WHO had put it there.

One more Valgrind tool -- DRD (and Helgrind). I use them to track dead locks and data races in my code, as well as thread API abuse.

0人赞添加讨论(0) 举报

牵手、夕阳

4楼-- · 2019-01-01 16:08

The question is ambiguous. Efficient from a runtime or memory standpoint? The context in which a particular piece of code is run, the architecture of the app and the data that might get thrown at it are all factors too.

BTW, Someone mentioned Purify; there's also its sister product Quantify.

0人赞添加讨论(0) 举报

浮光初槿花落

5楼-- · 2019-01-01 16:11

I've heard good word about Yourkit for Java profiling. For web sites, I've tried JMeter which has basic functionality for simple parametrized profiling.

0人赞添加讨论(0) 举报

旧时光的记忆

6楼-- · 2019-01-01 16:16

INSERTED: Let's look at "statistical significance".

Assume there's a function call instruction somewhere. You can't necessarily see it - a class, or a macro, or the compiler, may have slipped it in. There are other calls to the same function nearby, but this call is in a loop, or its arguments are such as to make this call take a long time. So much time in fact, that if this call could be made to take zero time, then total execution time would be reduced by some amount, say 90%. (Impossible? Not at all.) Would timing pinpoint it? No. Would a call graph pinpoint it? No. Would call counts pinpoint it? No. Because the problem is not at the function level, but at the call instruction level.

Somehow the program is randomly stopped at a point in time, and its state is examined. Will it have stopped in that 90% of the time that would be saved if the instruction could be "zeroed"? Of course - with 90% probability, and the instruction will be pinpointed on the stack waiting for its "work" to be completed.

In fact, if you stop it randomly 20 times, that instruction will be on the stack an average of 18 times, with a standard deviation of +/- 1.3 times.

Is that statistically significant? You bet it is.
Did you need a large number of samples? You bet you didn't.

Suppose the percentage is small, like 10% or 5%. The same principle applies.

In fact, no matter how few samples are taken, any instruction that is on >1 sample is statistically significant, and is a "hot spot", "bottleneck", or whatever you want to call it. If you could remove it, call it less, or somehow reduce it, it will save significant time. (Some you can't, like "call _main", but others you can. You just need to find them.)

Of course my code would never be so stupid, would it? Well then, prove it.

OK, now back to the story . . .

ORIGINAL ANSWER: I had heard of profilers, way back when, and I thought they must be pretty neat, but I didn't have access to them/it. I was working on an embedded processor (an 8086 intel chip) that seemed to be taking an awful long time painting some floating-point numbers on a display screen. The hardware guys suggested providing from their plenty - adding timer chips so I could see how long things were taking. However, one weekend I fired it up with an Intel "Blue Box" in-circuit emulator and got it running. While it was being slow, I wondered "What the heck is it doing?". So I just halted it to find out. The PC was in the floating-point library (there was no FP chip). That was no surprise, since it was painting floating-point numbers, but I wanted to know more. So I (laboriously) read the hex memory so as to follow the call stack. Guess what? It was in the process of taking the number to be painted, dividing it by 10, converting to integer, converting back to float, subtracting, and so on, just to get the next digit to paint. Needless to say, there were better ways to do that, resulting in a speedup of around 10 times. That was found with one (1) sample!

On another occasion, on a 68K chip, there was some slowness. Again, a profiler was not available, but debugger "adb" was, so while it was being slow, I halted it a few times. The PC was in the math library, actually in the 32-bit integer multiply routine. Looking up the stack, I found this code:

struct {...} a[...];

int i;
for (i = 0; i < ...; ++i){ ... a[i] ... }

There's no call to multiply in there - what's going on? Turns out, for a[i], the compiler has to multiply i by the array element size. Since i is 32 bits (in that compiler) it generates a call to the 32-bit multiply routine, and the stack pinpoints that call instruction. By declaring i as short, the loop tripled in speed!

What's the point? If you take samples at random times while the program's being slow, the PC will tell you what it's doing, but the call stack will tell you why, and lead you straight to the problem. Now, unless the problem is really bad, you may need to take multiple samples. Any statement that shows up on >1 sample is one you can be suspicious of. Notice, it pinpoints statements, instructions even, not big chunks of code like functions. This technique may be "quick and dirty", but it is very effective.

ADDED: If you do this repeatedly, you can solve problem after problem in the same software. For example, if you get a speedup of 3 times, small performance problems that may have been insignificant before now consume 3 times more of the remaining time. That makes them much easier to hit with samples. You may have to add a temporary outer loop to make it run long enough to sample. In this way, I've seen compounded speedup factors of more than 40 times.

0人赞添加讨论(0) 举报

泛滥B

7楼-- · 2019-01-01 16:17

For .NET I would recommend NDepend.

0人赞添加讨论(0) 举报

Analyzing Code for Efficiency?

采纳回答

编辑标签

举报内容

检举类型

检举原因

检举说明(必填)

打开微信“扫一扫”，打开网页后点击屏幕右上角分享按钮

付费偷看金额在0.1-10元之间