Is there any function in numpy to group this array down below by the first column?
I couldn't find any good answer over the internet..
>>> a
array([[ 1, 275],
[ 1, 441],
[ 1, 494],
[ 1, 593],
[ 2, 679],
[ 2, 533],
[ 2, 686],
[ 3, 559],
[ 3, 219],
[ 3, 455],
[ 4, 605],
[ 4, 468],
[ 4, 692],
[ 4, 613]])
Wanted output:
array([[[275, 441, 494, 593]],
[[679, 533, 686]],
[[559, 219, 455]],
[[605, 468, 692, 613]]], dtype=object)
given X as array of items you want to be grouped and y (1D array) as corresponding groups, following function does the grouping with numpy:
So,
groupby(a[:,1], a[:,0])
returns[array([275, 441, 494, 593]), array([679, 533, 686]), array([559, 219, 455]), array([605, 468, 692, 613])]
The numpy_indexed package (disclaimer: I am its author) aims to fill this gap in numpy. All operations in numpy-indexed are fully vectorized, and no O(n^2) algorithms were harmed during the making of this library.
Note that it is usually more efficient to directly compute relevant properties over such groups (ie, group_by(keys).mean(values)), rather than first splitting into a list / jagged array.
outputs:
Simplifying the answer of Vincent J one can use
return_index = True
instead ofreturn_counts = True
and get rid of thecumsum
:Output
I used np.unique() followed by np.extract()
[array([275, 441, 494, 593]), array([679, 533, 686]), array([559, 219, 455]), array([605, 468, 692, 613])]
Numpy is not very handy here because the desired output is not an array of integers (it is an array of list objects).
I suggest either the pure Python way...
...or the pandas way: