Group and average NumPy matrix

Say I have an arbitrary numpy matrix that looks like this:

arr = [[  6.0   12.0   1.0]
       [  7.0   9.0   1.0]
       [  8.0   7.0   1.0]
       [  4.0   3.0   2.0]
       [  6.0   1.0   2.0]
       [  2.0   5.0   2.0]
       [  9.0   4.0   3.0]
       [  2.0   1.0   4.0]
       [  8.0   4.0   4.0]
       [  3.0   5.0   4.0]]

What would be an efficient way of averaging rows that are grouped by their third column number?

The expected output would be:

result = [[  7.0  9.33  1.0]
          [  4.0  3.0  2.0]
          [  9.0  4.0  3.0]
          [  4.33  3.33  4.0]]

标签： python numpy matrix grouping average

4条回答

叛逆

2楼-- · 2020-07-05 07:36

A compact solution is to use numpy_indexed (disclaimer: I am its author), which implements a fully vectorized solution:

import numpy_indexed as npi
npi.group_by(arr[:, 2]).mean(arr)

0人赞添加讨论(0) 举报

老娘就宠你

3楼-- · 2020-07-05 07:48

arr = np.array(
[[  6.0,   12.0,   1.0],
 [  7.0,   9.0,   1.0],
 [  8.0,   7.0,   1.0],
 [  4.0,   3.0,   2.0],
 [  6.0,   1.0,   2.0],
 [  2.0,   5.0,   2.0],
 [  9.0,   4.0,   3.0],
 [  2.0,   1.0,   4.0],
 [  8.0,   4.0,   4.0],
 [  3.0,   5.0,   4.0]])
np.array([a.mean(0) for a in np.split(arr, np.argwhere(np.diff(arr[:, 2])) + 1)])

0人赞添加讨论(0) 举报

叛逆

4楼-- · 2020-07-05 07:50

You can do:

for x in sorted(np.unique(arr[...,2])):
    results.append([np.average(arr[np.where(arr[...,2]==x)][...,0]), 
                    np.average(arr[np.where(arr[...,2]==x)][...,1]),
                    x])

Testing:

>>> arr
array([[  6.,  12.,   1.],
       [  7.,   9.,   1.],
       [  8.,   7.,   1.],
       [  4.,   3.,   2.],
       [  6.,   1.,   2.],
       [  2.,   5.,   2.],
       [  9.,   4.,   3.],
       [  2.,   1.,   4.],
       [  8.,   4.,   4.],
       [  3.,   5.,   4.]])
>>> results=[]
>>> for x in sorted(np.unique(arr[...,2])):
...     results.append([np.average(arr[np.where(arr[...,2]==x)][...,0]), 
...                     np.average(arr[np.where(arr[...,2]==x)][...,1]),
...                     x])
... 
>>> results
[[7.0, 9.3333333333333339, 1.0], [4.0, 3.0, 2.0], [9.0, 4.0, 3.0], [4.333333333333333, 3.3333333333333335, 4.0]]

The array arr does not need to be sorted, and all the intermediate arrays are views (ie, not new arrays of data). The average is calculated efficiently directly from those views.

0人赞添加讨论(0) 举报

我欲成王，谁敢阻挡

5楼-- · 2020-07-05 07:54

solution

from itertools import groupby
from operator import itemgetter

arr = [[6.0, 12.0, 1.0],
       [7.0, 9.0, 1.0],
       [8.0, 7.0, 1.0],
       [4.0, 3.0, 2.0],
       [6.0, 1.0, 2.0],
       [2.0, 5.0, 2.0],
       [9.0, 4.0, 3.0],
       [2.0, 1.0, 4.0],
       [8.0, 4.0, 4.0],
       [3.0, 5.0, 4.0]]

result = []

for groupByID, rows in groupby(arr, key=itemgetter(2)):
    position1, position2, counter = 0, 0, 0
    for row in rows:
        position1+=row[0]
        position2+=row[1]
        counter+=1
    result.append([position1/counter, position2/counter, groupByID])

print(result)

would output:

[[7.0, 9.333333333333334, 1.0]]
[[4.0, 3.0, 2.0]]
[[9.0, 4.0, 3.0]]
[[4.333333333333333, 3.3333333333333335, 4.0]]

0人赞添加讨论(0) 举报

Group and average NumPy matrix

采纳回答

编辑标签

举报内容

检举类型

检举原因

检举说明(必填)

打开微信“扫一扫”，打开网页后点击屏幕右上角分享按钮

付费偷看金额在0.1-10元之间