I've been reading through the masked array documentation and I'm confused - what is different about MaskedArray than just maintaining an array of values and a boolean mask? Can someone give me an example where MaskedArrays are way more convenient, or higher performing?
Update 6/5
To be more concrete about my question, here is the classic example of how one uses a MaskedArray:
>>>data = np.arange(12).reshape(3, 4)
>>>mask = np.array([[0., 0., 1., 0.],
[0., 0., 0., 1.],
[0., 1., 0., 0.]])
>>>masked = np.ma.array(data, mask=mask)
>>>masked
masked_array(
data=[[0, 1, --, 3],
[4, 5, 6, --],
[8, --, 10, 11]],
mask=[[False, False, True, False],
[False, False, False, True],
[False, True, False, False]],
fill_value=999999)
>>>masked.sum(axis=0)
masked_array(data=[12, 6, 16, 14], mask=[False, False, False, False], fill_value=999999)
I could just as easily well do the same thing this way:
>>>data = np.arange(12).reshape(3, 4).astype(float)
>>>mask = np.array([[0., 0., 1., 0.],
[0., 0., 0., 1.],
[0., 1., 0., 0.]]).astype(bool)
>>>masked = data.copy() # this keeps the original data reuseable, as would
# the MaskedArray. If we only need to perform one
# operation then we could avoid the copy
>>>masked[mask] = np.nan
>>>np.nansum(masked, axis=0)
array([12., 6., 16., 14.])
I suppose the MaskedArray version looks a bit nicer, and avoids the copy if you need a reuseable array. Doesn't it use just as much memory when converting from standard ndarray to MaskedArray? And does it avoid the copy under the hood when applying the mask to the data? Are there other advantages?
The official answer is reported here:
In fact, masked arrays can be quite slow compared to the analogous array of nans:
When are they useful?
In many years of programming, I found them useful on the following occasions:
np.nan
for missing values but I mask also the value with poor SNR, so I can identify both.In general, you can consider masked array as a more compact representation. The best approach is to test case by case the more comprehensible and efficient solution.