I'm looking for the most efficient way of finding the intersection of two different-sized matrices. Each matrix has three variables (columns) and a varying number of observations (rows). For example, matrix A:
a = np.matrix('1 5 1003; 2 4 1002; 4 3 1008; 8 1 2005')
b = np.matrix('7 9 1006; 4 4 1007; 7 7 1050; 8 2 2003'; 9 9 3000; 7 7 1000')
If I set the tolerance for each column as col1 = 1
, col2 = 2
, and col3 = 10
, I would want a function such that it would output the indices in a
and b
that are within their respective tolerance, for example:
[x1, x2] = func(a, b, col1, col2, col3)
print x1
>> [2 3]
print x2
>> [1 3]
You can see by the indices, that element 2 of a
is within the tolerances of element 1 of b
.
I'm thinking I could loop through each element of matrix a
, check if it's within the tolerances of each element in b
, and do it that way. But it seems inefficient for very large data sets.
Any suggestions for alternatives to a looping method for accomplishing this?
If you don't mind working with NumPy arrays, you could exploit
broadcasting
for a vectorized solution. Here's the implementation -Sample run -
Large datasizes case : If you are working with huge datasizes that cause memory issues and since you already know that the number of columns is a small number
3
, you might want to have a minimal loop of3
iterations and save huge memory footprint, like so -