I'm trying to create a sparse vector from a series of arrays where there are some overlapping indexes. For a matrix there's a very convenient object in scipy that does exactly this:
coo_matrix((data, (i, j)), [shape=(M, N)])
So if data happens to have repeated elements (because their i,j indexes are the same), those are summed up in the final sparse matrix. I was wondering if it would be possible to do something similar but for sparse vectors, or do I have just to use this object and pretend it's a 1-column matrix?
While you might be able to reproduce a 1d equivalent, it would save a lot of work to just work with a 1 row (or 1 col) sparse matrix. I am not aware of any sparse vector package for numpy
.
The coo
format stores the input arrays exactly as you given them, without the summing. The summing is done when it is displayed or (otherwise) converted to a csc
or csr
format. And since the csr
constructor is compiled, it will to that summation faster than anything you could code in Python.
Construct a '1d' sparse coo matrix
In [67]: data=[10,11,12,14,15,16]
In [68]: col=[1,2,1,5,7,5]
In [70]: M=sparse.coo_matrix((data (np.zeros(len(col)),col)),shape=(1,10))
Look at its data representation (no summation)
In [71]: M.data
Out[71]: array([10, 11, 12, 14, 15, 16])
In [72]: M.row
Out[72]: array([0, 0, 0, 0, 0, 0])
In [73]: M.col
Out[73]: array([1, 2, 1, 5, 7, 5])
look at the array representation (shape (1,10)
)
In [74]: M.A
Out[74]: array([[ 0, 22, 11, 0, 0, 30, 0, 15, 0, 0]])
and the csr equivalent.
In [75]: M1=M.tocsr()
In [76]: M1.data
Out[76]: array([22, 11, 30, 15])
In [77]: M1.indices
Out[77]: array([1, 2, 5, 7])
In [78]: M1.indptr
Out[78]: array([0, 4])
In [79]: np.nonzero(M.A)
Out[79]: (array([0, 0, 0, 0]), array([1, 2, 5, 7]))
nonzero
shows the same pattern:
In [80]: M.nonzero()
Out[80]: (array([0, 0, 0, 0, 0, 0]), array([1, 2, 1, 5, 7, 5]))
In [81]: M.tocsr().nonzero()
Out[81]: (array([0, 0, 0, 0]), array([1, 2, 5, 7]))
In [82]: np.nonzero(M.A)
Out[82]: (array([0, 0, 0, 0]), array([1, 2, 5, 7]))
M.toarray().flatten()
will give you the (10,)
1d array.