I need to find unique rows in a numpy.array
.
For example:
>>> a # I have
array([[1, 1, 1, 0, 0, 0],
[0, 1, 1, 1, 0, 0],
[0, 1, 1, 1, 0, 0],
[1, 1, 1, 0, 0, 0],
[1, 1, 1, 1, 1, 0]])
>>> new_a # I want to get to
array([[1, 1, 1, 0, 0, 0],
[0, 1, 1, 1, 0, 0],
[1, 1, 1, 1, 1, 0]])
I know that i can create a set and loop over the array, but I am looking for an efficient pure numpy
solution. I believe that there is a way to set data type to void and then I could just use numpy.unique
, but I couldn't figure out how to make it work.
Beyond @Jaime excellent answer, another way to collapse a row is to uses
a.strides[0]
(assuminga
is C-contiguous) which is equal toa.dtype.itemsize*a.shape[0]
. Furthermorevoid(n)
is a shortcut fordtype((void,n))
. we arrive finally to this shortest version :For
Based on the answer in this page I have written a function that replicates the capability of MATLAB's
unique(input,'rows')
function, with the additional feature to accept tolerance for checking the uniqueness. It also returns the indices such thatc = data[ia,:]
anddata = c[ic,:]
. Please report if you see any discrepancies or errors.