Getting data of a box plot - Matplotlib

2020-07-22 04:26发布

问题:

I have to plot a boxplot of some data, which I could easily do with Matplotlib. However, I was requested to provide a table with the data presented there, like the whiskers, the medians, standard deviation, and so on.

I know that I could calculate these "by hand", but I also know, from the reference, that the boxplot method:

Returns a dictionary mapping each component of the boxplot to a list of the matplotlib.lines.Line2D instances created. That dictionary has the following keys (assuming vertical boxplots):

boxes: the main body of the boxplot showing the quartiles and the median’s confidence intervals if enabled.
medians: horizonal lines at the median of each box.
whiskers: the vertical lines extending to the most extreme, n-outlier data points.
caps: the horizontal lines at the ends of the whiskers.
fliers: points representing data that extend beyone the whiskers (outliers).

So I'm wondering how could I get these values, since they are matplotlib.lines.Line2D.

Thank you.

回答1:

As you've figured out, you need to access the members of the return value of boxplot.

Namely, e.g. if your return value is stored in bp

bp['medians'][0].get_ydata()

>> array([ 2.5,  2.5])

As the boxplot is vertical, and the median line is therefore a horizontal line, you only need to focus on one of the y-values; i.e. the median is 2.5 for my sample data.

For each "key" in the dictionary, the value will be a list to handle for multiple boxes. If you have just one boxplot, the list will only have one element, hence my use of bp['medians'][0] above. If you have multiple boxes in your boxplot, you will need to iterate over them using e.g.

for medline in bp['medians']:
    linedata = medline.get_ydata()
    median = linedata[0]

CT Zhu's answer doesn't work unfortunately, as the different elements behave differently. Also e.g. there's only one median, but two whiskers...therefore it's safest to manually treat each quantity as outlined above.

NB the closest you can come is the following;

res  = {}
for key, value in bp.items():
    res[key] = [v.get_data() for v in value]

or equivalently

res = {key : [v.get_data() for v in value] for key, value in bp.items()}