How can I use numpy/scipy to flatten a nested list with sublists of different sizes? Speed is very important and the lists are large.
lst = [[1, 2, 3, 4],[2, 3],[1, 2, 3, 4, 5],[4, 1, 2]]
Is anything faster than this?
vec = sp.array(list(*chain(lst)))
How about np.fromiter:
Use
chain.from_iterable
:This avoids using
*
which is quite expensive to handle if the iterable has many sublists.An other option might be to
sum
the lists:Note however that this will cause quadratic reallocation. Something like this performs much better:
On my machine I get:
As you can see, a 16x speed-up. The
chain.from_iterable
is even faster:An other 6x speed-up.
I looked for a "pure-python" solution, not knowing numpy. I believe
Abhijitunutbu/senderle's solution is the way to go in your case.You can try numpy.hstack
How about trying:
The fastest way to create a numpy array from an iterator is to use
numpy.fromiter
:As you can see, this is faster than converting to a list, and much faster than
hstack
.Use a function to flatten the list