Suppose I have two matrices:
A: size k x m
B: size m x n
Using a custom operation, my output will be k x n.
This custom operation is not a dot product between the rows of A
and columns of B
. Suppose this custom operation is defined as:
For the Ith row of A
and Jth column of B
, the i,j
element of the output is:
sum( (a[i] + b[j]) ^20 ), i loop over I, j loops over J
The only way I can see to implement this is to expand this equation, calculate each term, them sum them.
Is there a way in numpy or pytorch to do this without expanding the equation?
Apart from the method @hpaulj outlines in the comments, you can also use the fact that what you are calculating is essentially a pair-wise Minkowski distance:
import numpy as np
from scipy.spatial.distance import cdist
k,m,n = 10,20,30
A = np.random.random((k,m))
B = np.random.random((m,n))
method1 = ((A[...,None]+B)**20).sum(axis=1)
method2 = cdist(A,-B.T,'m',p=20)**20
np.allclose(method1,method2)
# True
You can implement it yourself
The following function generates all kind of dot product like functions, but don't use it to replace np.dot, because it will be quite a lot slower for larger arrays.
Template
import numpy as np
import numba as nb
from scipy.spatial.distance import cdist
def gen_dot_like_func(kernel,parallel=True):
kernel_nb=nb.njit(kernel,fastmath=True)
def cust_dot(A,B_in):
B=np.ascontiguousarray(B_in.T)
assert B.shape[1]==A.shape[1]
out=np.empty((A.shape[0],B.shape[0]),dtype=A.dtype)
for i in nb.prange(A.shape[0]):
for j in range(B.shape[0]):
sum=0
for k in range(A.shape[1]):
sum+=kernel_nb(A[i,k],B[j,k])
out[i,j]=sum
return out
if parallel==True:
return nb.njit(cust_dot,fastmath=True,parallel=True)
else:
return nb.njit(cust_dot,fastmath=True,parallel=False)
Generate your function
#This can be useful if you have a lot matrix-multiplication like functions
my_func=gen_dot_like_func(lambda A,B:(A+B)**20,parallel=True)
Timings
k,m,n = 10,20,30
%timeit method1 = ((A[...,None]+B)**20).sum(axis=1)
192 µs ± 554 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
%timeit method2 = cdist(A,-B.T,'m',p=20)**20
208 µs ± 1.85 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
%timeit res=my_func(A,B) #parallel=False
4.01 µs ± 34.5 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
k,m,n = 500,100,500
timeit method1 = ((A[...,None]+B)**20).sum(axis=1)
852 ms ± 4.93 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
%timeit method2 = cdist(A,-B.T,'m',p=20)**20
714 ms ± 2.12 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
%timeit res=my_func(A,B) #parallel=True
1.81 ms ± 11.5 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)