I look for a NMF implementation that has a python interface, and handles both missing data and zeros.
I don't want to impute my missing values before starting the factorization, I want them to be ignored in the minimized function.
It seems that neither scikit-learn, nor nimfa, nor graphlab, nor mahout propose such an option.
Thanks!
Using this Matlab to python code conversion sheet I was able to rewrite NMF from Matlab toolbox library.
I had to decompose a
40k X 1k
matrix with sparsity of 0.7%. Using 500 latent features my machine took 20 minutes for 100 iteration.Here is the method:
Here I was using Scipy sparse matrix as input and missing values were converted to
0
usingtoarray()
method. Therefore, the mask was created usingnumpy.sign()
function. However, if you havenan
values you could get same results by usingnumpy.isnan()
function.Scipy has a method to solve non-negative least squares problem (NNLS). In this answer, I am reproducing my blogpost on using scipy's NNLS for non-negative matrix factorisation. You may also be interested in my other blog posts that use autograd, Tensorflow and CVXPY for NNMF.
Goal: Our goal is given a matrix A, decompose it into two non-negative factors, as follows:
Overview
Our solution consists of two steps. First, we fix W and learn H, given A. Next, we fix H and learn W, given A. We repeat this procedure iteratively. Fixing one variable and learning the other (in this setting) is popularly known as alternating least squares, as the problem is reduced to a least squares problem. However, an important thing to note is that since we want to constraint W and H to be non-negative, we us NNLS instead of least squares.
Step 1: Learning H, given A and W
Using the illustration above, we can learn each column of H, using the corresponding column from A and the matrix W.
Handling missing entries in A
In the problem of collaborative filtering, A is usually the user-item matrix and it has a lot of missing entries. These missing entries correspond to user who have not rated items. We can modify our formulation to account for these missing entries. Consider that M' ≤ M entries in A have observed data, we would now modify the above equation as:
where, the mask is found by considering only the M′ entries.
Step 2: Learning W, given A and H
Code example
Imports
Creating matrix to be factorised
Masking a few entries
Defining matrices W and H
Defining the cost that we want to minimise
However, since A has missing entries, we have to define the cost in terms of the entries present in A
Let us just try to see the cost of the initial set of values of W and H we randomly assigned.
Alternating NNLS procedure
Let's view the values of the masked entries.