How to plot the difference of two distributions in

2019-04-12 17:56发布

I have the following code to compare two distributions:

sns.kdeplot(df['term'][df['outcome'] == 0], shade=1, color='red')
sns.kdeplot(df['term'][df['outcome'] == 1], shade=1, color='green'); 

It looks like this:

enter image description here

How do to plot just the difference of both distributions (disA - disB)? Of course, it could contain negative values.

1条回答
等我变得足够好
2楼-- · 2019-04-12 18:34

Since the difference between two kde curves is not a kde curve itself, you cannot use kdeplot to plot that difference.

A kde is easily calculated using scipy.stats.gaussian_kde. The result is easily plotted with pyplot.

import numpy as np; np.random.seed(0)
import matplotlib.pyplot as plt
import scipy.stats

a = np.random.gumbel(80, 25, 1000)
b = np.random.gumbel(90, 46, 4000)

kdea = scipy.stats.gaussian_kde(a)
kdeb = scipy.stats.gaussian_kde(b)

grid = np.linspace(0,500, 501)

plt.plot(grid, kdea(grid), label="kde A")
plt.plot(grid, kdeb(grid), label="kde B")
plt.plot(grid, kdea(grid)-kdeb(grid), label="difference")

plt.legend()
plt.show()

enter image description here

Mind that the result is really just the difference between the curves (as being asked for); it has no statistical relevance at all.

查看更多
登录 后发表回答