I've been reading through the masked array documentation and I'm confused - what is different about MaskedArray than just maintaining an array of values and a boolean mask? Can someone give me an example where MaskedArrays are way more convenient, or higher performing?
Update 6/5
To be more concrete about my question, here is the classic example of how one uses a MaskedArray:
>>>data = np.arange(12).reshape(3, 4)
>>>mask = np.array([[0., 0., 1., 0.],
[0., 0., 0., 1.],
[0., 1., 0., 0.]])
>>>masked = np.ma.array(data, mask=mask)
>>>masked
masked_array(
data=[[0, 1, --, 3],
[4, 5, 6, --],
[8, --, 10, 11]],
mask=[[False, False, True, False],
[False, False, False, True],
[False, True, False, False]],
fill_value=999999)
>>>masked.sum(axis=0)
masked_array(data=[12, 6, 16, 14], mask=[False, False, False, False], fill_value=999999)
I could just as easily well do the same thing this way:
>>>data = np.arange(12).reshape(3, 4).astype(float)
>>>mask = np.array([[0., 0., 1., 0.],
[0., 0., 0., 1.],
[0., 1., 0., 0.]]).astype(bool)
>>>masked = data.copy() # this keeps the original data reuseable, as would
# the MaskedArray. If we only need to perform one
# operation then we could avoid the copy
>>>masked[mask] = np.nan
>>>np.nansum(masked, axis=0)
array([12., 6., 16., 14.])
I suppose the MaskedArray version looks a bit nicer, and avoids the copy if you need a reuseable array. Doesn't it use just as much memory when converting from standard ndarray to MaskedArray? And does it avoid the copy under the hood when applying the mask to the data? Are there other advantages?