Increasing the depth of cProfiler in Python to rep

I'm trying to profile a function that calls other functions. I call the profiler as follows:

from mymodule import foo
def start():
   # ...
   foo()

import cProfile as profile
profile.run('start()', output_file)
p = pstats.Stats(output_file)
print "name: "
print p.sort_stats('name')
print "all stats: "
p.print_stats()
print "cumulative (top 10): "
p.sort_stats('cumulative').print_stats(10)

I find that the profiler says all the time was spend in function "foo()" of mymodule, instead of brekaing it down into the subfunctions foo() calls, which is what I want to see. How can I make the profiler report the performance of these functions?

thanks.

回答1:

You need p.print_callees() to get hierarchical breakdown of method calls. The output is quite self explanatory: On the left column you can find your function of interest e.g.foo(), then going to the right side column shows all called sub-functions and their scoped total and cumulative times. Breakdowns for these sub-calls are also included etc.

回答2:

First, I want to say that I was unable to replicate the Asker's issue. The profiler (in py2.7) definitely descends into the called functions and methods. (The docs for py3.6 look identical, but I haven't tested on py3.) My guess is that by limiting it to the top 10 returns, sorted by cumulative time, the first N of those were very high-level functions called a minimum of time, and the functions called by foo() dropped off the bottom of the list.

I decided to play with some big numbers for testing. Here's my test code:

# file: mymodule.py
import math

def foo(n = 5):
    for i in xrange(1,n):
        baz(i)
        bar(i ** i)

def bar(n):
    for i in xrange(1,n):
        e  = exp200(i)
        print "len e: ", len("{}".format(e))

def exp200(n):
    result = 1
    for i in xrange(200):
        result *= n
    return result

def baz(n):
    print "{}".format(n)

And the including file (very similiar to Asker's):

# file: test.py

from mymodule import foo

def start():
   # ...
   foo(8)

OUTPUT_FILE = 'test.profile_info'

import pstats
import cProfile as profile

profile.run('start()', OUTPUT_FILE)
p = pstats.Stats(OUTPUT_FILE)
print "name: "
print p.sort_stats('name')
print "all stats: "
p.print_stats()
print "cumulative (top 10): "
p.sort_stats('cumulative').print_stats(10)
print "time (top 10): "
p.sort_stats('time').print_stats(10)

Notice the last line. I added a view sorted by time, which is the total time spent in the function "excluding time made in calls to sub-functions". I find this view much more useful, as it tends to favor the functions that are doing actual work, and may be in need of optimization.

Here's the part of the results that the Asker was working from (cumulative-sorted):

cumulative (top 10):
Thu Mar 24 21:26:32 2016    test.profile_info

         2620840 function calls in 76.039 seconds

   Ordered by: cumulative time

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        1    0.000    0.000   76.039   76.039 <string>:1(<module>)
        1    0.000    0.000   76.039   76.039 test.py:5(start)
        1    0.000    0.000   76.039   76.039 /Users/jhazen/mymodule.py:4(foo)
        7   10.784    1.541   76.039   10.863 /Users/jhazen/mymodule.py:10(bar)
   873605   49.503    0.000   49.503    0.000 /Users/jhazen/mymodule.py:15(exp200)
   873612   15.634    0.000   15.634    0.000 {method 'format' of 'str' objects}
   873605    0.118    0.000    0.118    0.000 {len}
        7    0.000    0.000    0.000    0.000 /Users/jhazen/mymodule.py:21(baz)
        1    0.000    0.000    0.000    0.000 {method 'disable' of '_lsprof.Profiler' objects}

See how the top 3 functions in this display were only called once. Let's look at the time-sorted view:

time (top 10):
Thu Mar 24 21:26:32 2016    test.profile_info

         2620840 function calls in 76.039 seconds

   Ordered by: internal time

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
   873605   49.503    0.000   49.503    0.000 /Users/jhazen/mymodule.py:15(exp200)
   873612   15.634    0.000   15.634    0.000 {method 'format' of 'str' objects}
        7   10.784    1.541   76.039   10.863 /Users/jhazen/mymodule.py:10(bar)
   873605    0.118    0.000    0.118    0.000 {len}
        7    0.000    0.000    0.000    0.000 /Users/jhazen/mymodule.py:21(baz)
        1    0.000    0.000   76.039   76.039 /Users/jhazen/mymodule.py:4(foo)
        1    0.000    0.000   76.039   76.039 test.py:5(start)
        1    0.000    0.000   76.039   76.039 <string>:1(<module>)
        1    0.000    0.000    0.000    0.000 {method 'disable' of '_lsprof.Profiler' objects}

Now the number one entry makes sense. Obviously raising something to the 200th power by repeated multiplication is a "naive" strategy. Let's replace it:

def exp200(n):
    return n ** 200

And the results:

time (top 10):
Thu Mar 24 21:32:18 2016    test.profile_info

         2620840 function calls in 30.646 seconds

   Ordered by: internal time

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
   873612   15.722    0.000   15.722    0.000 {method 'format' of 'str' objects}
        7    9.760    1.394   30.646    4.378 /Users/jhazen/mymodule.py:10(bar)
   873605    5.056    0.000    5.056    0.000 /Users/jhazen/mymodule.py:15(exp200)
   873605    0.108    0.000    0.108    0.000 {len}
        7    0.000    0.000    0.000    0.000 /Users/jhazen/mymodule.py:18(baz)
        1    0.000    0.000   30.646   30.646 /Users/jhazen/mymodule.py:4(foo)
        1    0.000    0.000   30.646   30.646 test.py:5(start)
        1    0.000    0.000   30.646   30.646 <string>:1(<module>)
        1    0.000    0.000    0.000    0.000 {method 'disable' of '_lsprof.Profiler' objects}

That's a nice improvement in time. Now str.format() is our worst offender. I added the line in bar() to print the length of the number, because my first attempt (just computing the number and doing nothing with it) got optimized away, and my attempt to avoid that (printing the number, which got really big really fast) seemed like it might be blocking on I/O, so I compromised on printing the length of the number. Hey, that's the base-10 log. Let's try that:

def bar(n):
    for i in xrange(1,n):
        e  = exp200(i)
        print "log e: ", math.log10(e)

And the results:

time (top 10):
Thu Mar 24 21:40:16 2016    test.profile_info

         1747235 function calls in 11.279 seconds

   Ordered by: internal time

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        7    6.082    0.869   11.279    1.611 /Users/jhazen/mymodule.py:10(bar)
   873605    4.996    0.000    4.996    0.000 /Users/jhazen/mymodule.py:15(exp200)
   873605    0.201    0.000    0.201    0.000 {math.log10}
        7    0.000    0.000    0.000    0.000 /Users/jhazen/mymodule.py:18(baz)
        1    0.000    0.000   11.279   11.279 /Users/jhazen/mymodule.py:4(foo)
        7    0.000    0.000    0.000    0.000 {method 'format' of 'str' objects}
        1    0.000    0.000   11.279   11.279 test.py:5(start)
        1    0.000    0.000   11.279   11.279 <string>:1(<module>)
        1    0.000    0.000    0.000    0.000 {method 'disable' of '_lsprof.Profiler' objects}

Hmm, still a fair amount of time spent in bar(), even without the str.format(). Let's get rid of that print:

def bar(n):
    z = 0
    for i in xrange(1,n):
        e  = exp200(i)
        z += math.log10(e)
    return z

And the results:

time (top 10):
Thu Mar 24 21:45:24 2016    test.profile_info

         1747235 function calls in 5.031 seconds

   Ordered by: internal time

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
   873605    4.487    0.000    4.487    0.000 /Users/jhazen/mymodule.py:17(exp200)
        7    0.440    0.063    5.031    0.719 /Users/jhazen/mymodule.py:10(bar)
   873605    0.104    0.000    0.104    0.000 {math.log10}
        7    0.000    0.000    0.000    0.000 /Users/jhazen/mymodule.py:20(baz)
        1    0.000    0.000    5.031    5.031 /Users/jhazen/mymodule.py:4(foo)
        7    0.000    0.000    0.000    0.000 {method 'format' of 'str' objects}
        1    0.000    0.000    5.031    5.031 test.py:5(start)
        1    0.000    0.000    5.031    5.031 <string>:1(<module>)
        1    0.000    0.000    0.000    0.000 {method 'disable' of '_lsprof.Profiler' objects}

Now it looks like the stuff doing the actual work is the busiest function, so I think we're done optimizing.

Hope that helps!

回答3:

Maybe you faced with a similar problem, so I'm going to describe here my issue. My profiling code looked like this:

def foobar():
    import cProfile
    pr = cProfile.Profile()
    pr.enable()
    for event in reader.events():
        baz()
        # and other things

    pr.disable()
    pr.dump_stats('result.prof')

And the final profiling output contained only events() call. And I spent not so little time to realise that I had empty loop profiling. Of course, there was more than one call of foobar() from a client code, but meaningful profiling results had been overwritten by last one call with empty loop.