A kernel density estimator is used to estimate a particular probability density function (see mvstat.net and sckit-learn docs for references)
My confusion is about what exactly does kde2d()
do? Does it estimate the joint distribution probability density function of two random variables f(a,b) in the below example? And what does the color mean?
Here is the code example I am referring to.
b <- log10(rgamma(1000, 6, 3))
a <- log10((rweibull(1000, 8, 2)))
density <- kde2d(a, b, n=100)
colour_flow <- colorRampPalette(c('white', 'blue', 'yellow', 'red', 'darkred'))
filled.contour(density, color.palette=colour_flow)
What is a kernel density estimator? Essentially it fits a little normal density curve over every point (the center of the normal density being that point) of the data and then adds up all little normal densities to a kernel density estimator.
For the sake of illustration I will add an image of a 1 dimensional kernel density estimator from one of your links.
What about 2 dimensional kernel densities?
The function creates a grid from
min(a)
tomax(a)
and frommin(b)
tomax(b)
. Instead of fitting a tiny 1D normal density over every value ina
orb
,kde2d
now fits a tiny 2D normal density over every point in the grid. Just like in the 1 dimensional case kernel density, it then adds up all density values.What do the colours mean? As @cel pointed out in the comments: the estimated probability depends on two variables, so we have three axes now (
a
,b
andestimated probability
). One way to visualize 3 axes is by using iso-probability contours. This sounds fancy, but it is basically the same as the high/low pressure images we know from the weather forecast.You are using
So from low to high, the plot will be coloured
white
,blue
,yellow
,red
and eventuallydarkred
for the highest values of estimated probability. This results in the following plot: