I have list of lists with different lengths (e.g. [[1, 2, 3], [4, 5], [6, 7, 8, 9]]
) and want to convert it into a numpy
array of integers. I understand that 'sub' arrays in numpy
multidimensional array must be the same length. So what is the most efficient way to convert such a list as in example above into a numpy
array like this [[1, 2, 3, 0], [4, 5, 0, 0], [6, 7, 8, 9]]
, i.e. completed with zeros?
可以将文章内容翻译成中文,广告屏蔽插件可能会导致该功能失效(如失效,请关闭广告屏蔽插件后再试):
问题:
This question already has answers here:
Closed 4 months ago.
回答1:
you could make a numpy array with np.zeros and fill them with your list elements as shown below.
a = [[1, 2, 3], [4, 5], [6, 7, 8, 9]]
import numpy as np
b = np.zeros([len(a),len(max(a,key = lambda x: len(x)))])
for i,j in enumerate(a):
b[i][0:len(j)] = j
results in
[[ 1. 2. 3. 0.]
[ 4. 5. 0. 0.]
[ 6. 7. 8. 9.]]
回答2:
Here's a @Divakar
type of answer:
In [945]: ll = [[1, 2, 3], [4, 5], [6, 7, 8, 9]]
In [946]: lens = [len(l) for l in ll] # only iteration
In [947]: lens
Out[947]: [3, 2, 4]
In [948]: maxlen=max(lens)
In [949]: arr = np.zeros((len(ll),maxlen),int)
In [950]: mask = np.arange(maxlen) < np.array(lens)[:,None] # key line
In [951]: mask
Out[951]:
array([[ True, True, True, False],
[ True, True, False, False],
[ True, True, True, True]], dtype=bool)
In [952]: arr[mask] = np.concatenate(ll) # fast 1d assignment
In [953]: arr
Out[953]:
array([[1, 2, 3, 0],
[4, 5, 0, 0],
[6, 7, 8, 9]])
For large lists it has the potential of being faster. But it's harder to understand and/or recreate.
Convert Python sequence to NumPy array, filling missing values - has a good post by Divakar. itertools.zip_longest
is also mentioned. This could be cited as a duplicate.
回答3:
Do some preprocessing on the list, by padding the shorter sublists, before converting to a numpy array:
>>> lst = [[1, 2, 3], [4, 5], [1, 7, 8, 9]]
>>> pad = len(max(lst, key=len))
>>> np.array([i + [0]*(pad-len(i)) for i in lst])
array([[1, 2, 3, 0],
[4, 5, 0, 0],
[1, 7, 8, 9]])