I'm new to python, coming from matlab. I have a large sparse matrix saved in matlab v7.3 (HDF5) format. I've so far found two ways of loading in the file, using h5py
and tables
. However operating on the matrix seems to be extremely slow after either. For example, in matlab:
>> whos
Name Size Bytes Class Attributes
M 11337x133338 77124408 double sparse
>> tic, sum(M(:)); toc
Elapsed time is 0.086233 seconds.
Using tables:
t = time.time()
sum(f.root.M.data)
elapsed = time.time() - t
print elapsed
35.929461956
Using h5py:
t = time.time()
sum(f["M"]["data"])
elapsed = time.time() - t
print elapsed
(I gave up waiting ...)
[EDIT]
Based on the comments from @bpgergo, I should add that I've tried converting the result loaded in by h5py
(f
) into a numpy
array or a scipy
sparse array in the following two ways:
from scipy import sparse
A = sparse.csc_matrix((f["M"]["data"], f["M"]["ir"], f["tfidf"]["jc"]))
or
data = numpy.asarray(f["M"]["data"])
ir = numpy.asarray(f["M"]["ir"])
jc = numpy.asarray(f["M"]["jc"])
A = sparse.coo_matrix(data, (ir, jc))
but both of these operations are extremely slow as well.
Is there something I'm missing here?