Is there any numpy group by function?

2020-01-24 13:11发布

Is there any function in numpy to group this array down below by the first column?

I couldn't find any good answer over the internet..

>>> a
array([[  1, 275],
       [  1, 441],
       [  1, 494],
       [  1, 593],
       [  2, 679],
       [  2, 533],
       [  2, 686],
       [  3, 559],
       [  3, 219],
       [  3, 455],
       [  4, 605],
       [  4, 468],
       [  4, 692],
       [  4, 613]])

Wanted output:

array([[[275, 441, 494, 593]],
       [[679, 533, 686]],
       [[559, 219, 455]],
       [[605, 468, 692, 613]]], dtype=object)

7条回答
Fickle 薄情
2楼-- · 2020-01-24 13:53

Inspired by Eelco Hoogendoorn's library, but without his library, and using the fact that the first column of your array is always increasing.

>>> np.split(a[:, 1], np.cumsum(np.unique(a[:, 0], return_counts=True)[1])[:-1])
[array([275, 441, 494, 593]),
 array([679, 533, 686]),
 array([559, 219, 455]),
 array([605, 468, 692, 613])]

I didn't "timeit" but this is probably the faster way to achieve the question :

  • No python native loop
  • Result lists are numpy arrays, in case you need to make other numpy operations on them, no new conversion will be needed
  • Complexity like O(n)

PS: I wrote a similar line because I needed to "group by" the results of np.nonzero:

>>> indexes, values = np.nonzero(...)
>>> np.split(values, np.cumsum(np.unique(indexes, return_counts=True)[1]))
查看更多
登录 后发表回答