What I am trying to produce is something similar to this plot:
Which is a contour plot representing 68%, 95%, 99.7% of the particles comprised in two data sets.
So far, I have tried to implement a gaussain KDE estimate, and plotting those particles gaussians on a contour.
Files are added here https://www.dropbox.com/sh/86r9hf61wlzitvy/AABG2mbmmeokIiqXsZ8P76Swa?dl=0
from scipy.stats import gaussian_kde
import matplotlib.pyplot as plt
import numpy as np
# My data
x = RelDist
y = RadVel
# Peform the kernel density estimate
k = gaussian_kde(np.vstack([RelDist, RadVel]))
xi, yi = np.mgrid[x.min():x.max():x.size**0.5*1j,y.min():y.max():y.size**0.5*1j]
zi = k(np.vstack([xi.flatten(), yi.flatten()]))
fig = plt.figure()
ax = fig.gca()
CS = ax.contour(xi, yi, zi.reshape(xi.shape), colors='darkslateblue')
plt.clabel(CS, inline=1, fontsize=10)
ax.set_xlim(20, 800)
ax.set_ylim(-450, 450)
ax.set_xscale('log')
plt.show()
Producing this:
]2
Where 1) I do not know how to necessarily control the bin number in gaussain kde, 2) The contour labels are all zero, 3) I have no clue on determining the percentiles.
Any help is appreciated.
taken from this example in the matplotlib documentation
you can transform your data zi to a percentage scale (0-1) and then contour plot.
You can also manually determine the levels of the countour plot when you call plt.contour().
Below is an example with 2 randomly generated normal bivariate distributions:
delta = 0.025
x = y = np.arange(-3.0, 3.01, delta)
X, Y = np.meshgrid(x, y)
Z1 = plt.mlab.bivariate_normal(X, Y, 1.0, 1.0, 0.0, 0.0)
Z2 = plt.mlab.bivariate_normal(X, Y, 1.5, 0.5, 1, 1)
Z = 10* (Z1- Z2)
#transform zi to a 0-1 range
Z = Z = (Z - Z.min())/(Z.max() - Z.min())
levels = [0.68, 0.95, 0.997]
origin = 'lower'
CS = plt.contour(X, Y, Z, levels,
colors=('k',),
linewidths=(3,),
origin=origin)
plt.clabel(CS, fmt='%2.3f', colors='b', fontsize=14)
Using the data you provided the code works just as well:
from scipy.stats import gaussian_kde
import matplotlib.pyplot as plt
import numpy as np
RadVel = np.loadtxt('RadVel.txt')
RelDist = np.loadtxt('RelDist.txt')
x = RelDist
y = RadVel
k = gaussian_kde(np.vstack([RelDist, RadVel]))
xi, yi = np.mgrid[x.min():x.max():x.size**0.5*1j,y.min():y.max():y.size**0.5*1j]
zi = k(np.vstack([xi.flatten(), yi.flatten()]))
#set zi to 0-1 scale
zi = (zi-zi.min())/(zi.max() - zi.min())
zi =zi.reshape(xi.shape)
#set up plot
origin = 'lower'
levels = [0,0.1,0.25,0.5,0.68, 0.95, 0.975,1]
CS = plt.contour(xi, yi, zi,levels = levels,
colors=('k',),
linewidths=(1,),
origin=origin)
plt.clabel(CS, fmt='%.3f', colors='b', fontsize=8)
plt.gca()
plt.xlim(10,1000)
plt.xscale('log')
plt.ylim(-200,200)