Histogram with stacked components

Let's say that I have a value that I've measured every day for the past 90 days. I would like to plot a histogram of the values, but I want to make it easy for the viewer to see where the measurements have accumulated over certain non-overlapping subsets of the past 90 days. I want to do this by "subdividing" each bar of the histogram into chunks. One chunk for the earliest observations, one for more recent, one for the most recent.

This sounds like a job for df.plot(kind='bar', stacked=True) but I'm having trouble getting the details right.

Here's what I have so far:

import numpy as np
import pandas as pd
import seaborn as sbn

np.random.seed(0)

data = pd.DataFrame({'values': np.random.randn(90)})
data['bin'] = pd.cut(data['values'], 15, labels=False)
forhist = pd.DataFrame({'first70': data[:70].groupby('bin').count()['bin'],
                         'next15': data[70:85].groupby('bin').count()['bin'],
                         'last5': data[85:].groupby('bin').count()['bin']})

forhist.plot(kind='bar', stacked=True)

And that gives me:

poor result

This graph has some shortcomings:

The bars are stacked in the wrong order. last5 should be on top and next15 in the middle. I.e. they should be stacked in the order of the columns in forhist.
There is horizontal space between the bars
The x-axis is labeled with integers rather than something indicative of the values the bins represent. My "first choice" would be to have the x-axis labelled exactly as it would be if I just ran data['values'].hist(). My "second choice" would be to have the x-axis labelled with the "bin names" that I would get if I did pd.cut(data['values'], 15). In my code, I used labels=False because if I didn't do that, it would have used the bin edge labels (as strings) as the bar labels, and it would have put these in alphabetical order, making the graph basically useless.

What's the best way to approach this? I feel like I'm using very clumsy functions so far.

标签： python matplotlib pandas seaborn

1条回答

迷人小祖宗

2楼-- · 2020-02-28 18:47

Ok, here's one way to attack it, using features from the matplotlib hist function itself:

fig, ax = plt.subplots(1, 1, figsize=(9, 5))
ax.hist([data.ix[low:high, 'values'] for low, high in [(0, 70), (70, 85), (85, 90)]],
         bins=15,
         stacked=True,
         rwidth=1.0,
         label=['first70', 'next15', 'last5'])
ax.legend()

Which gives:

better

0人赞添加讨论(0) 举报

Histogram with stacked components

采纳回答

编辑标签

举报内容

检举类型

检举原因

检举说明(必填)

打开微信“扫一扫”，打开网页后点击屏幕右上角分享按钮

付费偷看金额在0.1-10元之间