Speeding up a numpy loop in python?

2019-06-19 03:33发布

问题:

Consider the following code using numpy arrays which is very slow :

# Intersection of an octree and a trajectory
def intersection(octree, trajectory):
    # Initialize numpy arrays
    ox = octree.get("x")
    oy = octree.get("y")
    oz = octree.get("z")
    oe = octree.get("extent")/2
    tx = trajectory.get("x")
    ty = trajectory.get("y")
    tz = trajectory.get("z")
    result = np.zeros(np.size(ox))
    # Loop over elements
    for i in range(0, np.size(tx)):
        for j in range(0, np.size(ox)):
            if (tx[i] > ox[j]-oe[j] and 
                tx[i] < ox[j]+oe[j] and 
                ty[i] > oy[j]-oe[j] and 
                ty[i] < oy[j]+oe[j] and 
                tz[i] > oz[j]-oe[j] and 
                tz[i] < oz[j]+oe[j]):
                result[j] += 1
    # Finalize
    return result

How to rewrite the function to speed up the calculation ? (np.size(tx) == 10000 and np.size(ox) == 100000)

回答1:

You are allocating 10000 lists of size 100000. The first thing to do would be to stop using range for the nested j loop and use the generator version xrange instead. This will save you time and space allocating all those lists.

The next one would be to use vectorized operations:

for i in xrange(0, np.size(tx)):
    index = (ox-oe < tx[i]) & (ox+oe > tx[i]) & (oy-oe < ty[i]) & (oy+oe > ty[i]) & (oz-oe < tz[i]) & (oz+oe > tz[i])
    result[index] += 1  


回答2:

You're likely to get good results by running this code under PyPy: http://pypy.org/ (instructions for our NumPy integration at https://bitbucket.org/pypy/numpy)



回答3:

I think this will give the same result for the double loop and be faster:

for j in xrange(np.size(ox)):
    result[j] += sum( abs(tx-ox[j])<oe[j] & abs(ty-oy[j])<oe[j] & abs(tz-oz[j])<oe[j] )

To get this: 1) reorder the loops (ie, swap them) which is valid since nothing changes within the loops; 2) pull result[j] outside the i loop; 3) convert all the t>ox-oe and t<ox+oe to abs(t-ox)<oe (though this may not be a huge speedup, it's easier to read).

Since you don't have runnable code, and I didn't want to build a test for this, I'm not 100% sure this is correct.