(This question relates to "populate a Pandas SparseDataFrame from a SciPy Sparse Matrix". I want to populate a SparseDataFrame from a scipy.sparse.coo_matrix (specifically) The mentioned question is for a different SciPy Sparse Matrix (csr)... So here it goes...)
I noticed Pandas now has support for Sparse Matrices and Arrays. Currently, I create DataFrame()
s like this:
return DataFrame(matrix.toarray(), columns=features, index=observations)
Is there a way to create a SparseDataFrame()
with a scipy.sparse.coo_matrix()
or coo_matrix()
? Converting to dense format kills RAM badly. Thanks!
http://pandas.pydata.org/pandas-docs/stable/sparse.html#interaction-with-scipy-sparse
Within
scipy.sparse
there are methods that convert the data forms to each other..tocoo
,.tocsc
, etc. So you can use which ever form is best for a particular operation.For going the other way, I've answered
Pandas sparse dataFrame to sparse matrix, without generating a dense matrix in memory
Your linked answer from 2013 iterates by row - using
toarray
to make the row dense. I haven't looked at what the pandasfrom_coo
does.A more recent SO question on pandas sparse
non-NDFFrame object error using pandas.SparseSeries.from_coo() function
From https://github.com/pydata/pandas/blob/master/pandas/sparse/scipy_sparse.py
In effect it takes the same
data
,i
,j
used to build acoo
matrix, makes a series, sorts it, and turns it into a sparse series.