Plotting contour lines that show percentage of par

2019-07-14 10:57发布

问题:

What I am trying to produce is something similar to this plot:

Which is a contour plot representing 68%, 95%, 99.7% of the particles comprised in two data sets.

So far, I have tried to implement a gaussain KDE estimate, and plotting those particles gaussians on a contour.

Files are added here https://www.dropbox.com/sh/86r9hf61wlzitvy/AABG2mbmmeokIiqXsZ8P76Swa?dl=0

from scipy.stats import gaussian_kde
import matplotlib.pyplot as plt
import numpy as np

# My data
x = RelDist
y = RadVel

# Peform the kernel density estimate
k = gaussian_kde(np.vstack([RelDist, RadVel]))
xi, yi = np.mgrid[x.min():x.max():x.size**0.5*1j,y.min():y.max():y.size**0.5*1j]
zi = k(np.vstack([xi.flatten(), yi.flatten()]))



fig = plt.figure()
ax = fig.gca()


CS = ax.contour(xi, yi, zi.reshape(xi.shape), colors='darkslateblue')
plt.clabel(CS, inline=1, fontsize=10)

ax.set_xlim(20, 800)
ax.set_ylim(-450, 450)
ax.set_xscale('log')

plt.show()

Producing this:

]2

Where 1) I do not know how to necessarily control the bin number in gaussain kde, 2) The contour labels are all zero, 3) I have no clue on determining the percentiles.

Any help is appreciated.

回答1:

taken from this example in the matplotlib documentation

you can transform your data zi to a percentage scale (0-1) and then contour plot.

You can also manually determine the levels of the countour plot when you call plt.contour().

Below is an example with 2 randomly generated normal bivariate distributions:

delta = 0.025
x = y = np.arange(-3.0, 3.01, delta)
X, Y = np.meshgrid(x, y)
Z1 = plt.mlab.bivariate_normal(X, Y, 1.0, 1.0, 0.0, 0.0)
Z2 = plt.mlab.bivariate_normal(X, Y, 1.5, 0.5, 1, 1)
Z = 10* (Z1- Z2)

#transform zi to a 0-1 range
Z = Z = (Z - Z.min())/(Z.max() - Z.min())

levels =  [0.68, 0.95, 0.997] 
origin = 'lower'
CS = plt.contour(X, Y, Z, levels,
              colors=('k',),
              linewidths=(3,),
              origin=origin)

plt.clabel(CS, fmt='%2.3f', colors='b', fontsize=14)

Using the data you provided the code works just as well:

from scipy.stats import gaussian_kde
import matplotlib.pyplot as plt
import numpy as np

RadVel = np.loadtxt('RadVel.txt')
RelDist = np.loadtxt('RelDist.txt')
x = RelDist
y = RadVel

k = gaussian_kde(np.vstack([RelDist, RadVel]))
xi, yi = np.mgrid[x.min():x.max():x.size**0.5*1j,y.min():y.max():y.size**0.5*1j]
zi = k(np.vstack([xi.flatten(), yi.flatten()]))

#set zi to 0-1 scale
zi = (zi-zi.min())/(zi.max() - zi.min())
zi =zi.reshape(xi.shape)

#set up plot
origin = 'lower'
levels = [0,0.1,0.25,0.5,0.68, 0.95, 0.975,1]

CS = plt.contour(xi, yi, zi,levels = levels,
              colors=('k',),
              linewidths=(1,),
              origin=origin)

plt.clabel(CS, fmt='%.3f', colors='b', fontsize=8)
plt.gca()
plt.xlim(10,1000)
plt.xscale('log')
plt.ylim(-200,200)