I have a small piece of code from a much larger script. I figured out that when the function t_area
is called, it is responsible for most of the run time. I tested the function by itself, and it is not slow, it takes a lot of time because of the number of times that it has to be ran I believe. Here is the code where the function is called:
tri_area = np.zeros((numx,numy),dtype=float)
for jj in range(0,numy-1):
for ii in range(0,numx-1):
xp = x[ii,jj]
yp = y[ii,jj]
zp = surface[ii,jj]
ap = np.array((xp,yp,zp))
xp = xp+dx
zp = surface[ii+1,jj]
bp = np.array((xp,yp,zp))
yp = yp+dx
zp = surface[ii+1,jj+1]
dp = np.array((xp,yp,zp))
xp = xp-dx
zp = surface[ii,jj+1]
cp = np.array((xp,yp,zp))
tri_area[ii,jj] = t_area(ap,bp,cp,dp)
The size of the arrays in use here are 216 x 217
, and so are the values of x
and y
. I am pretty new to python coding, I have used MATLAB in the past. So my question is, is there a way to get around the two for-loops, or a more efficient way to run through this block of code in general? Looking for any help speeding this up! Thanks!
EDIT:
Thanks for the help everyone, this has cleared alot of confusion up. I was asked about the function t_area that is used in the loop, here is the code below:
def t_area(a,b,c,d):
ab=b-a
ac=c-a
tri_area_a = 0.5*linalg.norm(np.cross(ab,ac))
db=b-d
dc=c-d
tri_area_d = 0.5*linalg.norm(np.cross(db,dc))
ba=a-b
bd=d-b
tri_area_b = 0.5*linalg.norm(np.cross(ba,bd))
ca=a-c
cd=d-c
tri_area_c = 0.5*linalg.norm(np.cross(ca,cd))
av_area = (tri_area_a + tri_area_b + tri_area_c + tri_area_d)*0.5
return(av_area)
Sorry for the confusing notation, at the time it made sense, looking back now I will probably change it. Thanks!
A caveat before we start.
range(0, numy-1)
, which is equal torange(numy-1)
, produces the numbers from 0 to numy-2, not including numy-1. That's because you have numy-1 values from 0 to numy-2. While MATLAB has 1-based indexing, Python has 0-based, so be a bit careful with your indexing in the transition. Considering you havetri_area = np.zeros((numx, numy), dtype=float)
,tri_area[ii,jj]
never accesses the last row or column with the way you have set up your loops. Therefore, I suspect the correct intention was to writerange(numy)
.Since the fuction
t_area()
is vectorisable, you can do away with the loops completely. Vectorisation means numpy applies some operations on a whole array at the same time by taking care of the loops under the hood, where they will be faster.First, we stack all the
ap
s for each (i, j) element in a (m, n, 3) array, where (m, n) is the size ofx
. If we take the cross product of two (m, n, 3) arrays, the operation will be applied on the last axis by default. This means thatnp.cross(a, b)
will do for every element (i, j) take the cross product of the 3 numbers ina[i,j]
andb[i,j]
. Similarly,np.linalg.norm(a, axis=2)
will do for every element (i, j) calculate the norm of the 3 numbers ina[i,j]
. This will also effectively reduce our array to size (m, n). A bit of caution here though, as we need to explicitly state we want this operation done on the 2nd axis.Note that in the following example my indexing relationship may not correspond to yours. The bare minimum to make this work is for
surface
to have one extra row and column fromx
andy
.x
in this example supports the indices 0-249, whilesurface
0-250.surface[:-1]
, a shorthand forsurface[0:-1]
, will return all rows starting from 0 and up to the last one, but not including it.-1
serves the same function andend
in MATLAB. So,surface[:-1]
will return the rows for indices 0-249. Similarly,surface[1:]
will return the rows for indices 1-250, which achieves the same as yoursurface[ii+1]
.Note: I had written this section before it was known that
t_area()
could be fully vectorised. So while what is here is obsolete for the purposes of this answer, I'll leave it as legacy to show what optimisations could have been made had the function not be vectorisable.Instead of calling the function for each element, which is expensive, you should pass it
x
,y,
,surface
anddx
and iterate internally. That means only one function call and less overhead.Furthermore, you shouldn't create an array for
ap
,bp
,cp
anddp
every loop, which again, adds overhead. Allocate them once outside the loop and just update their values.One final change should be the order of loops. Numpy arrays are row major by default (while MATLAB is column major), so
ii
performs better as the outer loop. You wouldn't notice the difference for arrays of your size, but hey, why not?Overall, the modified function should look like this.