I'm completely new to seaborn, so apologies if this is a simple question, but I cannot find anywhere in the documentation a description of how the levels plotted by n_levels are controlled in kdeplot. This is an example:
import seaborn as sns
import numpy as np
import matplotlib.pyplot as plt
x,y=np.random.randn(2,10000)
fig,ax=plt.subplots()
sns.kdeplot(x,y, shade=True,shade_lowest=False, ax=ax,n_levels=3,cmap="Reds")
plt.show()
This is the resulting plot:
I would like to be able to know what confidence levels are shown, so that I can label my plot "shaded regions show the (a,b,c) percentage confidence intervals." I would naively assume that n_levels is somehow related to equivalent "sigmas" in a Gaussian, but from the example that doesn't look to be the case.
Ideally, I would like to be able to specify the intervals shown by passing a tuple to kdeplot, such as:
levels=[68,95,99]
and plot these confidence regions.
EDIT: Thanks to @Goyo and @tom I think I can clarify my question, and come partway to the answer I am looking for. As pointed out, n_levels
is passed to plt.cotourf
as levels
, and so a list can be passed. But sns.kdeplot
plots the PDF, and the values in the PDF don't correspond to the confidence intervals I am looking for (since these correspond to integration of the PDF).
What I need to do is pass sns.kdeplot
the x,y
values of the integrated (and normalized) PDF, and then I will be able to enter e.g. n_levels=[0.68,0.95,0.99,1]
.
EDIT 2: I have now solved this problem. See below. I use a 2d normed histogram to define the confidence intervals, which I then pass as levels to the normed kde plot. Apologies for repetition, I could make a function to return levels, but I typed it all out explicitly.
import numpy as np
import scipy.optimize
import matplotlib.pyplot as plt
import seaborn as sns
# Generate some random data
x,y=np.random.randn(2,100000)
# Make a 2d normed histogram
H,xedges,yedges=np.histogram2d(x,y,bins=40,normed=True)
norm=H.sum() # Find the norm of the sum
# Set contour levels
contour1=0.99
contour2=0.95
contour3=0.68
# Set target levels as percentage of norm
target1 = norm*contour1
target2 = norm*contour2
target3 = norm*contour3
# Take histogram bin membership as proportional to Likelihood
# This is true when data comes from a Markovian process
def objective(limit, target):
w = np.where(H>limit)
count = H[w]
return count.sum() - target
# Find levels by summing histogram to objective
level1= scipy.optimize.bisect(objective, H.min(), H.max(), args=(target1,))
level2= scipy.optimize.bisect(objective, H.min(), H.max(), args=(target2,))
level3= scipy.optimize.bisect(objective, H.min(), H.max(), args=(target3,))
# For nice contour shading with seaborn, define top level
level4=H.max()
levels=[level1,level2,level3,level4]
# Pass levels to normed kde plot
fig,ax=plt.subplots()
sns.kdeplot(x,y, shade=True,ax=ax,n_levels=levels,cmap="Reds_d",normed=True)
ax.set_aspect('equal')
plt.show()
The resulting plot is now the following:
The levels are slightly wider than I expect, but I think this is correct.