I used to do all my Linux profiling with gprof.
However, with my multi-threaded application, it's output appears to be inconsistent.
Now, I dug this up:
http://sam.zoy.org/writings/programming/gprof.html
However, it's from a long time ago and in my gprof output, it appears my gprof is listing functions used by non-main threads.
So, my questions are:
1) In 2010, can I easily use gprof to profile multi-threaded Linux C++ applications? (Ubuntu 9.10) 2) What other tools should I look into for profiling?
Putting a slightly different twist on matters, you can actually get a pretty good idea as to what's going on in a multithreaded application using ftrace and kernelshark. Collecting the right trace and pressing the right buttons and you can see the scheduling of individual threads.
Depending on your distro's kernel you may have to build a kernel with the right configuration (but I think that a lot of them have it built in these days).
Have a look at Zoom.
A Paul R said, have a look at Zoom. You can also use lsstack, which is a low-tech approach but surprisingly effective, compared to gprof.
Added: Since you clarified that you are running OpenGL at 33ms, my prior recommendation stands. In addition, what I personally have done in situations like that is both effective and non-intuitive. Just get it running with a typical or problematic workload, and just stop it, manually, in its tracks, and see what it's doing and why. Do this several times. Now, if it only occasionally misbehaves, you would like to stop it only while it's misbehaving. That's not easy, but I've used an alarm-clock interrupt set for just the right delay. For example, if one frame out of 100 takes more than 33ms, at the start of a frame, set the timer for 35ms, and at the end of a frame, turn it off. That way, it will interrupt only when the code is taking too long, and it will show you why. Of course, one sample might miss the guilty code, but 20 samples won't miss it.
Microprofile is another possible answer to this. It requires hand-instrumentation of the code, but it seems like it handles multi-threaded code pretty well. And it also has special hooks for profiling graphics pipelines, including what's going on inside the card itself.
Since you didn't mention non-commercial, may I suggest Intel's VTune. It's not free but the level of detail is very impressive (and the overhead is negligible).
Edit: added another answer on poor man's profiler, which IMHO is better for multithreaded apps.
Have a look at oprofile. The profiling overhead of this tool is negligible and it supports multithreaded applications---as long as you don't want to profile mutex contention (which is a very important part of profiling multithreaded applications)