I just realized that doing
x.real*x.real+x.imag*x.imag
is three times faster than doing
abs(x)**2
where x is a numpy array of complex numbers. For code readability, I could define a function like
def abs2(x):
return x.real*x.real+x.imag*x.imag
which is still far faster than abs(x)**2, but it is at the cost of a function call. Is it possible to inline such a function, as I would do in C using macro or using inline keyword?
Is it possible to inline such a function, as I would do in C using macro or using inline keyword?
No. Before reaching this specific instruction, Python interpreters don't even know if there's such a function, much less what it does.
As noted in comments, PyPy will inline automatically (the above still holds - it "simply" generates an optimized version at runtime, benefits from it, but breaks out of it when it's invalidated), although in this specific case that doesn't help as implementing NumPy on PyPy started only shortly ago and isn't even beta level to this day. But the bottom line is: Don't worry about optimizations on this level in Python. Either the implementations optimize it themselves or they don't, it's not your responsibility.
Not exactly what the OP has asked for, but close:
Inliner inlines Python function calls. Proof of concept for this
blog
post
from inliner import inline
@inline
def add_stuff(x, y):
return x + y
def add_lots_of_numbers():
results = []
for i in xrange(10):
results.append(add_stuff(i, i+1))
In the above code the add_lots_of_numbers function is converted into
this:
def add_lots_of_numbers():
results = []
for i in xrange(10):
results.append(i + i + 1)
Also anyone interested in this question and the complications involved in implementing such optimizer in CPython, might also want to have a look at:
- Issue 10399: AST Optimization: inlining of function calls
- PEP 511 -- API for code transformers (Rejected)
I'll agree with everyone else that such optimizations will just cause you pain on CPython, that if you care about performance you should consider PyPy (though our NumPy may be too incomplete to be useful). However I'll disagree and say you can care about such optimizations on PyPy, not this one specifically as has been said PyPy does that automatically, but if you know PyPy well you really can tune your code to make PyPy emit the assembly you want, not that you need to almost ever.
No.
The closest you can get to C macros is a script (awk or other) that you may include in a makefile, and which substitutes a certain pattern like abs(x)**2 in your python scripts with the long form.
Actually it might be even faster to calculate, like:
x.real** 2+ x.imag** 2
Thus, the extra cost of function call will likely to diminish. Lets see:
In []: n= 1e4
In []: x= randn(n, 1)+ 1j* rand(n, 1)
In []: %timeit x.real* x.real+ x.imag* x.imag
10000 loops, best of 3: 100 us per loop
In []: %timeit x.real** 2+ x.imag** 2
10000 loops, best of 3: 77.9 us per loop
And encapsulating the calculation in a function:
In []: def abs2(x):
..: return x.real** 2+ x.imag** 2
..:
In []: %timeit abs2(x)
10000 loops, best of 3: 80.1 us per loop
Anyway (as other have pointed out) this kind of micro-optimization (in order to avoid a function call) is not really productive way to write python code.