Efficiently calculating sum of squared differences

2019-08-10 19:57发布

问题:

This is a loop for extracting the RGB values of two images, and calculating the sum of squared differences across all three channels. Running this code directly in my main.py takes 0.07 sec. The speed gets reduced to 1sec if I run it in this .pyx file instead. I have read about cdef the functions, but I have had no success of passing the arrays then. Any help on converting this function to cdef function would be appreciated. I really need this loop to go as fast as possible.

from cpython cimport array
import array
import numpy as np
cimport numpy as np

def fittnes(Orginal, Mutated):

    Fittnes = 0

    for x in range(0, 299):

        for y in range(0, 299):

            DeltaRed   = (Orginal[x][y][0] - Mutated[x][y][0])
            DeltaGreen = (Orginal[x][y][1] - Mutated[x][y][1])
            DeltaBlue  = (Orginal[x][y][2] - Mutated[x][y][2])

            Fittnes += (DeltaRed * DeltaRed + DeltaGreen * DeltaGreen + DeltaBlue * DeltaBlue)

    return Fittnes

My Main.py function call

 NewScore = cythona.fittnes(numpy.array(Orginal), numpy.array(MutatedImage))

回答1:

Got me interested to know about the speedup numbers, so I am posting this as a solution. So, as stated/discussed in the comments, if the inputs are NumPy arrays, you could use native NumPy tools, and in this case ndarray.sum(), like so -

out = ((Orginal - Mutated)**2).sum()

You can also use the very efficient np.einsum for the same task, like so -

sub = Orginal - Mutated
out = np.einsum('ijk,ijk->',sub,sub)

Runtime tests

Define functions -

def org_app(Orginal,Mutated):
    Fittnes = 0
    for x in range(0, Orginal.shape[0]):
        for y in range(0, Orginal.shape[1]):
            DR = (Orginal[x][y][0] - Mutated[x][y][0])
            DG = (Orginal[x][y][1] - Mutated[x][y][1])
            DB  = (Orginal[x][y][2] - Mutated[x][y][2])
            Fittnes += (DR * DR + DG * DG + DB * DB)
    return Fittnes

def einsum_based(Orginal,Mutated):
    sub = Orginal - Mutated
    return np.einsum('ijk,ijk->',sub,sub)

def dot_based(Orginal,Mutated): # @ali_m's suggestion
    sub = Orginal - Mutated
    return np.dot(sub.ravel(), sub.ravel())

def vdot_based(Orginal,Mutated):  # variant of @ali_m's suggestion
    sub = Orginal - Mutated
    return np.vdot(sub, sub)

Timings -

In [14]: M,N = 100,100
    ...: Orginal = np.random.rand(M,N,3)
    ...: Mutated = np.random.rand(M,N,3)
    ...: 

In [15]: %timeit org_app(Orginal,Mutated)
    ...: %timeit ((Orginal - Mutated)**2).sum()
    ...: %timeit einsum_based(Orginal,Mutated)
    ...: %timeit dot_based(Orginal,Mutated)
    ...: %timeit vdot_based(Orginal,Mutated)
    ...: 
10 loops, best of 3: 54.9 ms per loop
10000 loops, best of 3: 112 µs per loop
10000 loops, best of 3: 69.8 µs per loop
10000 loops, best of 3: 86.2 µs per loop
10000 loops, best of 3: 85.3 µs per loop

In [16]: # Inputs
    ...: M,N = 1000,1000
    ...: Orginal = np.random.rand(M,N,3)
    ...: Mutated = np.random.rand(M,N,3)
    ...: 

In [17]: %timeit org_app(Orginal,Mutated)
    ...: %timeit ((Orginal - Mutated)**2).sum()
    ...: %timeit einsum_based(Orginal,Mutated)
    ...: %timeit dot_based(Orginal,Mutated)
    ...: %timeit vdot_based(Orginal,Mutated)
    ...: 
1 loops, best of 3: 5.49 s per loop
10 loops, best of 3: 63 ms per loop
10 loops, best of 3: 23.9 ms per loop
10 loops, best of 3: 24.9 ms per loop
10 loops, best of 3: 24.9 ms per loop