Consider the following code using numpy arrays which is very slow :
# Intersection of an octree and a trajectory
def intersection(octree, trajectory):
# Initialize numpy arrays
ox = octree.get("x")
oy = octree.get("y")
oz = octree.get("z")
oe = octree.get("extent")/2
tx = trajectory.get("x")
ty = trajectory.get("y")
tz = trajectory.get("z")
result = np.zeros(np.size(ox))
# Loop over elements
for i in range(0, np.size(tx)):
for j in range(0, np.size(ox)):
if (tx[i] > ox[j]-oe[j] and
tx[i] < ox[j]+oe[j] and
ty[i] > oy[j]-oe[j] and
ty[i] < oy[j]+oe[j] and
tz[i] > oz[j]-oe[j] and
tz[i] < oz[j]+oe[j]):
result[j] += 1
# Finalize
return result
How to rewrite the function to speed up the calculation ? (np.size(tx) == 10000
and np.size(ox) == 100000
)
You are allocating 10000 lists of size 100000. The first thing to do would be to stop using range
for the nested j
loop and use the generator version xrange
instead. This will save you time and space allocating all those lists.
The next one would be to use vectorized operations:
for i in xrange(0, np.size(tx)):
index = (ox-oe < tx[i]) & (ox+oe > tx[i]) & (oy-oe < ty[i]) & (oy+oe > ty[i]) & (oz-oe < tz[i]) & (oz+oe > tz[i])
result[index] += 1
You're likely to get good results by running this code under PyPy: http://pypy.org/ (instructions for our NumPy integration at https://bitbucket.org/pypy/numpy)
I think this will give the same result for the double loop and be faster:
for j in xrange(np.size(ox)):
result[j] += sum( abs(tx-ox[j])<oe[j] & abs(ty-oy[j])<oe[j] & abs(tz-oz[j])<oe[j] )
To get this: 1) reorder the loops (ie, swap them) which is valid since nothing changes within the loops; 2) pull result[j]
outside the i
loop; 3) convert all the t>ox-oe and t<ox+oe
to abs(t-ox)<oe
(though this may not be a huge speedup, it's easier to read).
Since you don't have runnable code, and I didn't want to build a test for this, I'm not 100% sure this is correct.