Why does it take so long to create a SparseDataFra

2019-04-28 04:32发布

问题:

Given the following code (executed in a Jupyter notebook):

In [1]: import pandas as pd
        %time df=pd.SparseDataFrame(index=range(0,1000), columns=range(0,1000));

CPU times: user 3.89 s, sys: 30.3 ms, total: 3.92 s
Wall time: 3.92 s

Why does it take so long to create a sparse data frame?

Note that it seems to be irrelevant if I increse the dimension for the rows. But when I increase the number of columns from 1000 to say 10000, the code seems to take forever and I always had to abort it.

Compare this with scipy's sparse matrix:

In [2]: from scipy.sparse import lil_matrix
        %time m=lil_matrix((1000, 1000))

CPU times: user 1.09 ms, sys: 122 µs, total: 1.21 ms
Wall time: 1.18 ms