This is a loop for extracting the RGB values of two images, and calculating the sum of squared differences across all three channels.
Running this code directly in my main.py takes 0.07 sec. The speed gets reduced to 1sec if I run it in this .pyx file instead. I have read about cdef the functions, but I have had no success of passing the arrays then. Any help on converting this function to cdef function would be appreciated. I really need this loop to go as fast as possible.
from cpython cimport array
import array
import numpy as np
cimport numpy as np
def fittnes(Orginal, Mutated):
Fittnes = 0
for x in range(0, 299):
for y in range(0, 299):
DeltaRed = (Orginal[x][y][0] - Mutated[x][y][0])
DeltaGreen = (Orginal[x][y][1] - Mutated[x][y][1])
DeltaBlue = (Orginal[x][y][2] - Mutated[x][y][2])
Fittnes += (DeltaRed * DeltaRed + DeltaGreen * DeltaGreen + DeltaBlue * DeltaBlue)
return Fittnes
My Main.py function call
NewScore = cythona.fittnes(numpy.array(Orginal), numpy.array(MutatedImage))
Got me interested to know about the speedup numbers, so I am posting this as a solution. So, as stated/discussed in the comments, if the inputs are NumPy arrays, you could use native NumPy tools, and in this case ndarray.sum()
, like so -
out = ((Orginal - Mutated)**2).sum()
You can also use the very efficient np.einsum
for the same task, like so -
sub = Orginal - Mutated
out = np.einsum('ijk,ijk->',sub,sub)
Runtime tests
Define functions -
def org_app(Orginal,Mutated):
Fittnes = 0
for x in range(0, Orginal.shape[0]):
for y in range(0, Orginal.shape[1]):
DR = (Orginal[x][y][0] - Mutated[x][y][0])
DG = (Orginal[x][y][1] - Mutated[x][y][1])
DB = (Orginal[x][y][2] - Mutated[x][y][2])
Fittnes += (DR * DR + DG * DG + DB * DB)
return Fittnes
def einsum_based(Orginal,Mutated):
sub = Orginal - Mutated
return np.einsum('ijk,ijk->',sub,sub)
def dot_based(Orginal,Mutated): # @ali_m's suggestion
sub = Orginal - Mutated
return np.dot(sub.ravel(), sub.ravel())
def vdot_based(Orginal,Mutated): # variant of @ali_m's suggestion
sub = Orginal - Mutated
return np.vdot(sub, sub)
Timings -
In [14]: M,N = 100,100
...: Orginal = np.random.rand(M,N,3)
...: Mutated = np.random.rand(M,N,3)
...:
In [15]: %timeit org_app(Orginal,Mutated)
...: %timeit ((Orginal - Mutated)**2).sum()
...: %timeit einsum_based(Orginal,Mutated)
...: %timeit dot_based(Orginal,Mutated)
...: %timeit vdot_based(Orginal,Mutated)
...:
10 loops, best of 3: 54.9 ms per loop
10000 loops, best of 3: 112 µs per loop
10000 loops, best of 3: 69.8 µs per loop
10000 loops, best of 3: 86.2 µs per loop
10000 loops, best of 3: 85.3 µs per loop
In [16]: # Inputs
...: M,N = 1000,1000
...: Orginal = np.random.rand(M,N,3)
...: Mutated = np.random.rand(M,N,3)
...:
In [17]: %timeit org_app(Orginal,Mutated)
...: %timeit ((Orginal - Mutated)**2).sum()
...: %timeit einsum_based(Orginal,Mutated)
...: %timeit dot_based(Orginal,Mutated)
...: %timeit vdot_based(Orginal,Mutated)
...:
1 loops, best of 3: 5.49 s per loop
10 loops, best of 3: 63 ms per loop
10 loops, best of 3: 23.9 ms per loop
10 loops, best of 3: 24.9 ms per loop
10 loops, best of 3: 24.9 ms per loop