I have a rank-1 numpy.array of which I want to make a boxplot. However, I want to exclude all values equal to zero in the array ... Currently, I solved this by looping the array and copy the value to a new array if not equal to zero. However, as the array consists of 86 000 000 values and I have to do this multiple times, this takes a lot of patience.
Is there a more intelligent way to do this?
You can index with a Boolean array. For a NumPy array
A
:You can use Boolean array indexing as above,
bool
type conversion,np.nonzero
, ornp.where
. Here's some performance benchmarking:For a NumPy array
a
, you can useto extract the values not equal to zero.
this is a case where you want to use masked arrays, it keeps the shape of your array and it is automatically recognized by all numpy and matplotlib functions.
I would like to suggest you to simply utilize
NaN
for cases like this, where you'll like to ignore some values, but still want to keep the procedure statistical as meaningful as possible. SoA simple line of code can get you an array that excludes all '0' values:
example: