When plotting columns of a dataframe with pandas, e.g.
df.boxplot()
the automatic adjustment of the yaxis
can lead to a large amount of unused space in the plot. I wonder if this is because the dataframe has points that exceed the boxplot whiskers (but for some reason the outliers aren't displayed). If that is the case, what would be a good way to automatically adjust ylim
so that there isn't so much empty space in the plot?
I think a combination of the seaborn style and the way matplotlib draws boxplots is hiding your outliers here.
If I generate some skewed data
And then use the
boxplot
method on the dataframe, I see something similarBut if you change the symbol used to plot outliers, you get
Alternatively, you can use the seaborn
boxplot
function, which does the same thing but with some nice aesthetics:Building on eumiro's answer in this SO post (I just extend it to pandas data frames you could do the following
The argument
m
is the number of standard deviations to ignore.EDIT:
Why do the whiskers not include the maximum outliers in the first place?
There are several types of Boxplots as described on Wikipedia. The
pandas
boxplot calls tomatplotlib
's boxplot. If you take a look at the documentation for this the argumentwhis
"Defines the length of the whiskers as a function of the inner quartile range. So it won't cover the entire range by design.