From an array like db
(which will be approximately (1e6, 300)
) and a mask = [1, 0, 1]
vector, I define the target as a 1 in the first column.
I want to create an out
vector that consists of ones where the corresponding row in db
matches the mask
and target==1
, and zeros everywhere else.
db = np.array([ # out for mask = [1, 0, 1]
# target, vector #
[1, 1, 0, 1], # 1
[0, 1, 1, 1], # 0 (fit to mask but target == 0)
[0, 0, 1, 0], # 0
[1, 1, 0, 1], # 1
[0, 1, 1, 0], # 0
[1, 0, 0, 0], # 0
])
I have defined a vline
function that applies a mask
to each array line using np.array_equal(mask, mask & vector)
to check that vectors 101 and 111 fit the mask, then retains only the indices where target == 1
.
out
is initialized to array([0, 0, 0, 0, 0, 0])
out = [0, 0, 0, 0, 0, 0]
The vline
function is defined as:
def vline(idx, mask):
line = db[idx]
target, vector = line[0], line[1:]
if np.array_equal(mask, mask & vector):
if target == 1:
out[idx] = 1
I get the correct result by applying this function line-by-line in a for
loop:
def check_mask(db, out, mask=[1, 0, 1]):
# idx_db to iterate over db lines without enumerate
for idx in np.arange(db.shape[0]):
vline(idx, mask=mask)
return out
assert check_mask(db, out, [1, 0, 1]) == [1, 0, 0, 1, 0, 0] # it works !
Now I want to vectorize vline
by creating a ufunc
:
ufunc_vline = np.frompyfunc(vline, 2, 1)
out = [0, 0, 0, 0, 0, 0]
ufunc_vline(db, [1, 0, 1])
print out
But the ufunc
complains about broadcasting inputs with those shapes:
In [217]: ufunc_vline(db, [1, 0, 1])
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-217-9008ebeb6aa1> in <module>()
----> 1 ufunc_vline(db, [1, 0, 1])
ValueError: operands could not be broadcast together with shapes (6,4) (3,)
In [218]:
Converting
vline
to a numpy ufunc fundamentally doesn't make sense, since ufuncs are always applied to numpy arrays in an elementwise fashion. Because of this, the input arguments must either have the same shape, or must be broadcastable to the same shape. You are passing two arrays with incompatible shapes to yourufunc_vline
function (db.shape == (6, 4)
andmask.shape == (3,)
), hence theValueError
you are seeing.There are a couple of other issues with
ufunc_vline
:np.frompyfunc(vline, 2, 1)
specifies thatvline
should return a single output argument, whereasvline
actually returns nothing (but modifiesout
in place).You are passing
db
as the first argument toufunc_vline
, whereasvline
expects the first argument to beidx
, which is used as an index into the rows ofdb
.Also, bear in mind that creating a ufunc from a Python function using
np.frompyfunc
will not yield any noticeable performance benefit over a standard Pythonfor
loop. To see any serious improvement you would probably need to code the ufunc in a low-level language such as C (see this example in the documentation).Having said that, your
vline
function can be easily vectorized using standard boolean array operations:For example: