I fit a model using scikit-learn NMF model on my training data. Now I perform an inverse transform of new data using
result_1 = model.inverse_transform(model.transform(new_data))
Then I compute the inverse transform of my data manually taking the components from the NMF model, using the equation as in Slide 15 here.
temp = np.dot(model.components_, model.components_.T)
transform = np.dot(np.dot(model.components_.T, np.linalg.pinv(temp)),
model.components_)
result_2 = np.dot(new_data, transform)
I would like to understand why the 2 results do not match. What am I doing wrong while computing the inverse transform and reconstructing the data?
Sample code:
import numpy as np
from sklearn.decomposition import NMF
data = np.array([[0,0,1,1,1],[0,1,1,0,0],[0,1,0,0,0],[1,0,0,1,0]])
print(data)
//array([[0, 0, 1, 1, 1],
[0, 1, 1, 0, 0],
[0, 1, 0, 0, 0],
[1, 0, 0, 1, 0]])
model = NMF(alpha=0.0, init='random', l1_ratio=0.0, max_iter=200, n_components=2, random_state=0, shuffle=False, solver='cd', tol=0.0001, verbose=0)
model.fit(data)
NMF(alpha=0.0, beta_loss='frobenius', init='random', l1_ratio=0.0,
max_iter=200, n_components=2, random_state=0, shuffle=False, solver='cd',
tol=0.0001, verbose=0)
new_data = np.array([[0,0,1,0,0], [1,0,0,0,0]])
print(new_data)
//array([[0, 0, 1, 0, 0],
[1, 0, 0, 0, 0]])
result_1 = model.inverse_transform(model.transform(new_data))
print(result_1)
//array([[ 0.09232497, 0.38903892, 0.36668712, 0.23067627, 0.1383513 ],
[ 0.0877082 , 0. , 0.12131779, 0.21914115, 0.13143295]])
temp = np.dot(model.components_, model.components_.T)
transform = np.dot(np.dot(model.components_.T, np.linalg.pinv(temp)), model.components_)
result_2 = np.dot(new_data, transform)
print(result_2)
//array([[ 0.09232484, 0.389039 , 0.36668699, 0.23067595, 0.13835111],
[ 0.09193481, -0.05671439, 0.09232484, 0.22970145, 0.13776664]])
Note: Although this is not the best data describing my issue, the code is essentially the same. Also result_1
and result_2
are much more different from each other in the actual case. data
and new_data
are also large arrays.
What happens
In scikit-learn, NMF does more than simple matrix multiplication: it optimizes!
Decoding (
inverse_transform
) is linear: the model calculatesX_decoded = dot(W, H)
, whereW
is the encoded matrix, andH=model.components_
is a learned matrix of model parameters.Encoding (
transform
), however, is nonlinear : it performsW = argmin(loss(X_original, H, W))
(with respect toW
only), where loss is mean squared error betweenX_original
anddot(W, H)
, plus some additional penalties (L1 and L2 norms ofW
), and with the constraint thatW
must be non-negative. Minimization is performed by coordinate descent, and result may be nonlinear inX_original
. Thus, you cannot simply getW
by multiplying matrices.Why it is so weird
NMF has to perform such strange calculations because, otherwise, the model may produce negative results. Indeed, in your own example, you could try to perform transform by matrix multiplication
and get the result
W
that contains negative numbers:However, the coordinate descent within NMF avoids this problem by slightly modifying the matrix:
gives a non-negative result
You can see that it does not simply clip
W
matrix from below, but modifies the positive elements as well, in order to improve the fit (and obey the regularization penalties).