Using distplot in Python

2019-03-13 09:38发布

问题:

What is the unit of the y-axis when using distplot to plot a histogram? I have plotted different histograms together with a normal fit and I see that in one case, it has a range of 0 to 0.9 while in another a range of 0 to 4.5.

Thank you.

回答1:

From help(sns.distplot):

norm_hist : bool, otional If True, the histogram height shows a density rather than a count. This is implied if a KDE or fitted density is plotted.

A density is scaled so that the area under the curve is 1, so no individual bin will ever be taller than 1 (the whole dataset)[2]. But kde is on by default and overrides norm_hist, so norm_hist changes the y-units only if you explicitly turn kde off:

import matplotlib.pyplot as plt
import numpy as np
import seaborn as sns

fig, axs = plt.subplots(figsize=(6,6), ncols=2, nrows=2)
data = np.random.randint(0,20,40)

for row in (0,1):
    for col in (0,1):
        sns.distplot(data, kde=row, norm_hist=col, ax=axs[row, col])

axs[0,0].set_ylabel('NO kernel density')
axs[1,0].set_ylabel('KDE on')
axs[1,0].set_xlabel('norm_hist=False')
axs[1,1].set_xlabel('norm_hist=True')

[2] clarification from mwaskom, thanks!