How to vectorize custom algorithms in numpy or pyt

2020-05-09 04:36发布

问题:

Suppose I have two matrices:

A: size k x m

B: size m x n

Using a custom operation, my output will be k x n.

This custom operation is not a dot product between the rows of A and columns of B. Suppose this custom operation is defined as:

For the Ith row of A and Jth column of B, the i,j element of the output is:

sum( (a[i] + b[j]) ^20 ), i loop over I, j loops over J

The only way I can see to implement this is to expand this equation, calculate each term, them sum them.

Is there a way in numpy or pytorch to do this without expanding the equation?

回答1:

Apart from the method @hpaulj outlines in the comments, you can also use the fact that what you are calculating is essentially a pair-wise Minkowski distance:

import numpy as np
from scipy.spatial.distance import cdist

k,m,n = 10,20,30
A = np.random.random((k,m))
B = np.random.random((m,n))

method1 = ((A[...,None]+B)**20).sum(axis=1)
method2 = cdist(A,-B.T,'m',p=20)**20

np.allclose(method1,method2)
# True


回答2:

You can implement it yourself

The following function generates all kind of dot product like functions, but don't use it to replace np.dot, because it will be quite a lot slower for larger arrays.

Template

import numpy as np
import numba as nb
from scipy.spatial.distance import cdist

def gen_dot_like_func(kernel,parallel=True):

    kernel_nb=nb.njit(kernel,fastmath=True)

    def cust_dot(A,B_in):
        B=np.ascontiguousarray(B_in.T)
        assert B.shape[1]==A.shape[1]

        out=np.empty((A.shape[0],B.shape[0]),dtype=A.dtype)
        for i in nb.prange(A.shape[0]):
            for j in range(B.shape[0]):
                sum=0
                for k in range(A.shape[1]):
                    sum+=kernel_nb(A[i,k],B[j,k])
                out[i,j]=sum
        return out

    if parallel==True:
        return nb.njit(cust_dot,fastmath=True,parallel=True)
    else:
        return nb.njit(cust_dot,fastmath=True,parallel=False)

Generate your function

#This can be useful if you have a lot matrix-multiplication like functions
my_func=gen_dot_like_func(lambda A,B:(A+B)**20,parallel=True)

Timings

k,m,n = 10,20,30
%timeit method1 = ((A[...,None]+B)**20).sum(axis=1)
192 µs ± 554 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
%timeit method2 = cdist(A,-B.T,'m',p=20)**20
208 µs ± 1.85 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
%timeit res=my_func(A,B) #parallel=False
4.01 µs ± 34.5 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)

k,m,n = 500,100,500
timeit method1 = ((A[...,None]+B)**20).sum(axis=1)
852 ms ± 4.93 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
%timeit method2 = cdist(A,-B.T,'m',p=20)**20
714 ms ± 2.12 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
%timeit res=my_func(A,B) #parallel=True
1.81 ms ± 11.5 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)