Find the set difference between two large arrays (

I have two large 2-d arrays and I'd like to find their set difference taking their rows as elements. In Matlab, the code for this would be setdiff(A,B,'rows'). The arrays are large enough that the obvious looping methods I could think of take too long.

标签： python numpy set-difference

3条回答

啃猪蹄的小仙女

2楼-- · 2019-01-26 06:59

Here is a nice alternative pure numpy solution that works for 1.6.1. It does create an intermediate array, so this may or may not be a problem for you. It also does not rely on any speedup from a sorted array or not (as setdiff probably does).

from numpy import *
# Create some sample arrays
A =random.randint(0,5,(10,3))
B =random.randint(0,5,(10,3))

As an example, this is what I got - note that there is one common element:

>>> A
array([[1, 0, 3],
       [0, 4, 2],
       [0, 3, 4],
       [4, 4, 2],
       [2, 0, 2],
       [4, 0, 0],
       [3, 2, 2],
       [4, 2, 3],
       [0, 2, 1],
       [2, 0, 2]])
>>> B
array([[4, 1, 3],
       [4, 3, 0],
       [0, 3, 3],
       [3, 0, 3],
       [3, 4, 0],
       [3, 2, 3],
       [3, 1, 2],
       [4, 1, 2],
       [0, 4, 2],
       [0, 0, 3]])

We look for when the (L1) distance between the rows is zero. This gives us a matrix, which at the points where it is zero, these are the items common to both lists:

idx = where(abs((A[:,newaxis,:] - B)).sum(axis=2)==0)

As a check:

>>> A[idx[0]]
array([[0, 4, 2]])
>>> B[idx[1]]
array([[0, 4, 2]])

0人赞添加讨论(0) 举报

贼婆χ

3楼-- · 2019-01-26 07:10

This should work, but is currently broken in 1.6.1 due to an unavailable mergesort for the view being created. It works in the pre-release 1.7.0 version. This should be the fastest way possible, since the views don't have to copy any memory:

>>> import numpy as np
>>> a1 = np.array([[1,2,3],[4,5,6],[7,8,9]])
>>> a2 = np.array([[4,5,6],[7,8,9],[1,1,1]])
>>> a1_rows = a1.view([('', a1.dtype)] * a1.shape[1])
>>> a2_rows = a2.view([('', a2.dtype)] * a2.shape[1])
>>> np.setdiff1d(a1_rows, a2_rows).view(a1.dtype).reshape(-1, a1.shape[1])
array([[1, 2, 3]])

You can do this in Python, but it might be slow:

>>> import numpy as np
>>> a1 = np.array([[1,2,3],[4,5,6],[7,8,9]])
>>> a2 = np.array([[4,5,6],[7,8,9],[1,1,1]])
>>> a1_rows = set(map(tuple, a1))
>>> a2_rows = set(map(tuple, a2))
>>> a1_rows.difference(a2_rows)
set([(1, 2, 3)])

0人赞添加讨论(0) 举报

Explosion°爆炸

4楼-- · 2019-01-26 07:16

I'm not sure what you are going for, but this will get you a boolean array of where 2 arrays are not equal, and will be numpy fast:


import numpy as np
a = np.random.randn(5, 5)
b = np.random.randn(5, 5)
a[0,0] = 10.0
b[0,0] = 10.0 
a[1,1] = 5.0
b[1,1] = 5.0
c = ~(a-b==0)
print c

[[False  True  True  True  True]
 [ True False  True  True  True]
 [ True  True  True  True  True]
 [ True  True  True  True  True]
 [ True  True  True  True  True]]

0人赞添加讨论(0) 举报

Find the set difference between two large arrays (

采纳回答

编辑标签

举报内容

检举类型

检举原因

检举说明(必填)

打开微信“扫一扫”，打开网页后点击屏幕右上角分享按钮

付费偷看金额在0.1-10元之间