Fast weighted scatter matrix calculation

In this question six months ago, jez was nice enough to help me come up with a fast approximation for the outer product of row differences, i.e.:

K = np.zeros((len(X), len(X)))
for i, Xi in enumerate(X):
  for j, Xj in enumerate(X):
    dij = Xi - Xj
    K += np.outer(dij, dij)

That worked for finding a scatter matrix calculation for a form of Fisher Discriminant Analysis. But now I am trying to do Local Fisher Discriminant Analysis, where each outer product is weighted by a matrix A that has information about the pair's locality, so the new line is:

K += A[i][j] * np.outer(dij, dij)

Unfortunately, the quick way to calculate the unweighted scatter matrix presented in the previous answer doesn't work for this, and as far as I can tell it's not easy to make the quick change.

Linear Algebra is definitely not my strong suit, I'm not good at coming up with these kinds of things. What is a fast way to calculate the weighted sum of pairwise row-difference outer products?

标签： python numpy matrix linear-algebra

1条回答

老娘就宠你

2楼-- · 2019-02-15 08:59

Here is a way to vectorize the calculation you specified. If you do a lot of this kind of thing, then it may be worth learning how to use, "numpy.tensordot". It multiplies all elements according to standard numpy broadcasting, and then it sums over the pairs of axes given with the kwrd, "axes".

Here is the code:

# Imports
import numpy as np
from numpy.random import random

# Original calculation for testing purposes 
def ftrue(A, X):
  ""
  K = np.zeros((len(X), len(X)))
  KA_true = np.zeros((len(X), len(X)))

  for i, Xi in enumerate(X):
    for j, Xj in enumerate(X):
      dij = Xi - Xj
      K += np.outer(dij, dij)
      KA_true += A[i, j] * np.outer(dij, dij) 
  return ftrue

# Better: No Python loops. But, makes a large temporary array.
def fbetter(A, X):
  ""
  c = X[:, None, :] - X[None, :, :]
  b = A[:, :, None] * c           # ! BAD ! temporary array size N**3
  KA_better = np.tensordot(b, c, axes = [(0,1),(0,1)])
  return KA_better

# Best way: No Python for loops. No large temporary arrays
def fbest(A, X):
  ""
  KA_best = np.tensordot(A.sum(1)[:,None] * X, X, axes=[(0,), (0,)])
  KA_best += np.tensordot(A.sum(0)[:,None] * X, X, axes=[(0,), (0,)])
  KA_best -= np.tensordot(np.dot(A, X), X, axes=[(0,), (0,)])
  KA_best -= np.tensordot(X, np.dot(A, X), axes=[(0,), (0,)])
  return KA_best


# Test script
if __name__ == "__main__":

  # Parameters for the computation 
  N = 250
  X = random((N, N))
  A = random((N, N))

  # Print the error
  KA_better = fbetter(A, X)
  KA_best = fbest(A, X)

  # Test against true if array size isn't too big
  if N<100:
    KA_true = ftrue(A, X)
    err = abs(KA_better - KA_true).mean()
    msg = "Mean absolute difference (better): {}."
    print(msg.format(err))

  # Test best against better
  err = abs(KA_best - KA_better).mean()
  msg = "Mean absolute difference (best): {}."
  print(msg.format(err))

My first attempt (fbetter) made a large temporary array of size NxNxN. The second attempt (fbest) never makes anything bigger than NxN. This works pretty good up to N~1000.

A timing test

Also, the code runs faster when the output array is smaller. enter image description here

I have MKL installed so the calls to tensordot are really fast and run in parallel.

Thanks for the question. This was a nice exercise and reminded me how important it is to avoid making large temporary arrays.

0人赞添加讨论(0) 举报

Fast weighted scatter matrix calculation

采纳回答

编辑标签

举报内容

检举类型

检举原因

检举说明(必填)

打开微信“扫一扫”，打开网页后点击屏幕右上角分享按钮

付费偷看金额在0.1-10元之间