I'm plotting about 10,000 items in an array. They are of around 1,000 unique values.
The plotting has been running half an hour now. I made sure rest of the code works.
Is it that slow? This is my first time plotting histograms with pyplot.
I'm plotting about 10,000 items in an array. They are of around 1,000 unique values.
The plotting has been running half an hour now. I made sure rest of the code works.
Is it that slow? This is my first time plotting histograms with pyplot.
To plot histograms using matplotlib quickly you need to pass the histtype='step'
argument to pyplot.hist
. For example:
plt.hist(np.random.exponential(size=1000000,bins=10000))
plt.show()
takes ~15 seconds to draw and roughly 5-10 seconds to update when you pan or zoom.
In contrast, plotting with histtype='step'
:
plt.hist(np.random.exponential(size=1000000),bins=10000,histtype='step')
plt.show()
plots almost immediately and can be panned and zoomed with no delay.
It will be instant to plot the histogram after flattening the numpy array. Try the below demo code:
import numpy as np
array2d = np.random.random_sample((512,512))*100
plt.hist(array2d.flatten())
plt.hist(array2d.flatten(), bins=1000)
Importing seaborn somewhere in the code may cause pyplot.hist to take a really long time.
If the problem is seaborn, it can be solved by resetting the matplotlib settings:
import seaborn as sns
sns.reset_orig()
For me, the problem is that the data type of pd.series, say S, is 'object' rather than 'float64'. After I use S = np.float64(S)
, then plt.hist(S) is very quick!!
For me it took calling figure.canvas.draw()
after the call to hist to update immediately, i.e. hist was actually fast (discovered that after timing it), but there was a delay of a few seconds before figure was updated. I was calling hist inside a matplotlib callback in a jupyter lab cell (qt5 backend).
Anyone running into the issue I had - (which is totally my bad :) )
If you're dealing with numbers, make sure when reading from CSV that your datatype is int/float, and not string.
values_arr = .... .flatten().astype('float')
If you are working with pandas, make sure the data you passed in plt.hist() is a 1-d series rather than a dataframe. This helped me out.
I was facing the same problem using Pandas .hist()
method. For me the solution was:
pd.to_numeric(df['your_data']).hist()
Which worked instantly.