I'm working on a complex network software and I have trouble determining how to improve the systems performance.
Specifically in one part of the software which is using blocking synchronous calls. Since this part of the system is doing heavy computations it's nearly impossible to determine whether the slowness of this component is caused by these computations or the waiting for the other parts of the system.
Are there any light-weight profilers that can capture this information? I can't use heavy duty profile like valgrind since that would completely skew the results (although valgrind would be perfect, since it captures all the required information).
I tried using oProfile but I just wasn't able to get any meaningful results out of it (perhaps if there is a concise tutorial somewhere...).