I use a NumPy function einsum to perform element-wise multiplication of two 2D NumPy arrays and sum.
np.einsum('ij,ij',A,B)
Each of these array A, B are sized around 10000 x 10000
. I notice this operation is the bottleneck in my code taking up ~85 % of the processing time. How do I quickly parallelize this operation?