I want to compute eigenvectors for an array of data (in my actual case, i cloud of polygons)
To do so i wrote this function:
import numpy as np
def eigen(data):
eigenvectors = []
eigenvalues = []
for d in data:
# compute covariance for each triangle
cov = np.cov(d, ddof=0, rowvar=False)
# compute eigen vectors
vals, vecs = np.linalg.eig(cov)
eigenvalues.append(vals)
eigenvectors.append(vecs)
return np.array(eigenvalues), np.array(eigenvectors)
Running this on some test data:
import cProfile
triangles = np.random.random((10**4,3,3,)) # 10k 3D triangles
cProfile.run('eigen(triangles)') # 550005 function calls in 0.933 seconds
Works fine but it gets very slow because of the iteration loop. Is there a faster way to compute the data I need without iterating over the array? And if not can anyone suggest ways to speed it up?
Hack It!
Well I hacked into covariance func definition
and put in the stated input states : ddof=0, rowvar=False
and as it turns out, everything reduces to just three lines -
nC = m.shape[1] # m is the 2D input array
X = m - m.mean(0)
out = np.dot(X.T, X)/nC
To extend it to our 3D array case, I wrote down the loopy version with these three lines being iterated for the 2D arrays sections from the 3D input array, like so -
for i,d in enumerate(m):
# Using np.cov :
org_cov = np.cov(d, ddof=0, rowvar=False)
# Using earlier 2D array hacked version :
nC = m[i].shape[0]
X = m[i] - m[i].mean(0,keepdims=True)
hacked_cov = np.dot(X.T, X)/nC
Boost-it-up
We are needed to speedup the last three lines there. Computation of X
across all iterations could be done with broadcasting
-
diffs = data - data.mean(1,keepdims=True)
Next up, the dot-product calculation for all iterations could be done with transpose
and np.dot
, but that transpose
could be a costly affair for such a multi-dimensional array. A better alternative exists in np.einsum
, like so -
cov3D = np.einsum('ijk,ijl->ikl',diffs,diffs)/data.shape[1]
Use it!
To sum up :
for d in data:
# compute covariance for each triangle
cov = np.cov(d, ddof=0, rowvar=False)
Could be pre-computed like so :
diffs = data - data.mean(1,keepdims=True)
cov3D = np.einsum('ijk,ijl->ikl',diffs,diffs)/data.shape[1]
These pre-computed values could be used across iterations to compute eigen vectors like so -
for i,d in enumerate(data):
# Directly use pre-computed covariances for each triangle
vals, vecs = np.linalg.eig(cov3D[i])
Test It!
Here are some runtime tests to assess the effect of pre-computing covariance results -
In [148]: def original_app(data):
...: cov = np.empty(data.shape)
...: for i,d in enumerate(data):
...: # compute covariance for each triangle
...: cov[i] = np.cov(d, ddof=0, rowvar=False)
...: return cov
...:
...: def vectorized_app(data):
...: diffs = data - data.mean(1,keepdims=True)
...: return np.einsum('ijk,ijl->ikl',diffs,diffs)/data.shape[1]
...:
In [149]: data = np.random.randint(0,10,(1000,3,3))
In [150]: np.allclose(original_app(data),vectorized_app(data))
Out[150]: True
In [151]: %timeit original_app(data)
10 loops, best of 3: 64.4 ms per loop
In [152]: %timeit vectorized_app(data)
1000 loops, best of 3: 1.14 ms per loop
In [153]: data = np.random.randint(0,10,(5000,3,3))
In [154]: np.allclose(original_app(data),vectorized_app(data))
Out[154]: True
In [155]: %timeit original_app(data)
1 loops, best of 3: 324 ms per loop
In [156]: %timeit vectorized_app(data)
100 loops, best of 3: 5.67 ms per loop
I don't know how much of a speed up you can actually achieve.
Here is a slight modification that can help a little:
%timeit -n 10 values, vectors = \
eigen(triangles)
10 loops, best of 3: 745 ms per loop
%timeit values, vectors = \
zip(*(np.linalg.eig(np.cov(d, ddof=0, rowvar=False)) for d in triangles))
10 loops, best of 3: 705 ms per loop