Element-wise broadcasting for comparing two NumPy

Let's say I have an array like this:

import numpy as np

base_array = np.array([-13, -9, -11, -3, -3, -4,   2,  2,
                         2,  5,   7,  7,  8,  7,  12, 11])

Suppose I want to know: "how many elements in base_array are greater than 4?" This can be done simply by exploiting broadcasting:

np.sum(4 < base_array)

For which the answer is 7. Now, suppose instead of comparing to a single value, I want to do this over an array. In other words, for each value c in the comparison_array, find out how many elements of base_array are greater than c. If I do this the naive way, it obviously fails because it doesn't know how to broadcast it properly:

comparison_array = np.arange(-13, 13)
comparison_result = np.sum(comparison_array < base_array)

Output:

Traceback (most recent call last):
  File "<pyshell#87>", line 1, in <module>
    np.sum(comparison_array < base_array)
ValueError: operands could not be broadcast together with shapes (26,) (16,)

If I could somehow have each element of comparison_array get broadcast to base_array's shape, that would solve this. But I don't know how to do such an "element-wise broadcasting".

Now, I do know I how to implement this for both cases using list comprehension:

first = sum([4 < i for i in base_array])
second = [sum([c < i for i in base_array])
          for c in comparison_array]
print(first)
print(second)

Output:

7
[15, 15, 14, 14, 13, 13, 13, 13, 13, 12, 10, 10, 10, 10, 10, 7, 7, 7, 6, 6, 3, 2, 2, 2, 1, 0]

But as we all know, this will be orders of magnitude slower than a correctly-vectorized numpy implementation on larger arrays. So, how should I do this in numpy so that it's fast? Ideally this solution should extend to any kind of operation where broadcasting works, not just greater-than or less-than in this example.

标签： python arrays numpy vectorization numpy-broadcasting

3条回答

淡お忘

2楼-- · 2019-02-22 10:07

You will want to transpose one of the arrays for broadcasting to work correctly. When you broadcast two arrays together, the dimensions are lined up and any unit dimensions are effectively expanded to the non-unit size that they match. So two arrays of size (16, 1) (the original array) and (1, 26) (the comparison array) would broadcast to (16, 26).

Don't forget to sum across the dimension of size 16:

(base_array[:, None] > comparison_array).sum(axis=1)

None in a slice is equivalent to np.newaxis: it's one of many ways to insert a new unit dimension at the specified index. The reason that you don't need to do comparison_array[None, :] is that broadcasting lines up the highest dimensions, and fills in the lowest with ones automatically.

0人赞添加讨论(0) 举报

相关推荐>>

3楼-- · 2019-02-22 10:09

You can simply add a dimension to the comparison array, so that the comparison is "stretched" across all values along the new dimension.

>>> np.sum(comparison_array[:, None] < base_array)
228

This is the fundamental principle with broadcasting, and works for all kinds of operations.

If you need the sum done along an axis, you just specify the axis along which you want to sum after the comparison.

>>> np.sum(comparison_array[:, None] < base_array, axis=1)
array([15, 15, 14, 14, 13, 13, 13, 13, 13, 12, 10, 10, 10, 10, 10,  7,  7,
        7,  6,  6,  3,  2,  2,  2,  1,  0])

0人赞添加讨论(0) 举报

孤傲高冷的网名

4楼-- · 2019-02-22 10:10

Here's one with np.searchsorted with focus on memory efficiency and hence performance -

def get_comparative_sum(base_array, comparison_array):
    n = len(base_array)
    base_array_sorted = np.sort(base_array)
    idx = np.searchsorted(base_array_sorted, comparison_array, 'right')
    idx[idx==n] = n-1
    return n - idx - (base_array_sorted[idx] == comparison_array)

Timings -

In [40]: np.random.seed(0)
    ...: base_array = np.random.randint(-1000,1000,(10000))
    ...: comparison_array = np.random.randint(-1000,1000,(20000))

# @miradulo's soln
In [41]: %timeit np.sum(comparison_array[:, None] < base_array, axis=1)
1 loop, best of 3: 386 ms per loop

In [42]: %timeit get_comparative_sum(base_array, comparison_array)
100 loops, best of 3: 2.36 ms per loop

0人赞添加讨论(0) 举报

Element-wise broadcasting for comparing two NumPy

采纳回答

编辑标签

举报内容

检举类型

检举原因

检举说明(必填)

打开微信“扫一扫”，打开网页后点击屏幕右上角分享按钮

付费偷看金额在0.1-10元之间