let's say I have this array A:
array([ 0.0019879 , -0.00172861, -0.00527226, 0.00639585, -0.00242005,
-0.00717373, 0.00371651, 0.00164218, 0.00034572, -0.00864304,
-0.00639585, 0.006828 , 0.00354365, 0.00043215, -0.00440795,
0.00544512, 0.00319793, 0.00164218, 0.00025929, -0.00155575,
0.00129646, 0.00259291, -0.0039758 , 0.00328436, 0.00207433,
0.0011236 , 0.00440795, 0.00164218, -0.00319793, 0.00233362,
0.00025929, 0.00017286, 0.0008643 , 0.00363008])
If I run:
np.histogram(A, bins=9, density=True)
as hist I get:
array([ 34.21952021, 34.21952021, 34.21952021, 34.21952021,
34.21952021, 188.20736116, 102.65856063, 68.43904042,
51.32928032])
The manual says:
"If True, the result is the value of the probability density function at the bin, normalized such that the integral over the range is 1. Note that the sum of the histogram values will not be equal to 1 unless bins of unity width are chosen; it is not a probability mass function."
I thought I had a good understanding of histograms and density functions but I really don't understand what those values represent or how they are calculated.
I need to reproduce those values with R, as I am porting some code between the two languages.
In R, you can use the
hist()
function to plot your histogram. Additionally,hist
is anS3
function that produces a list.Here is the default histogram produced by R with your vector
A
.Here is the histogram with an additional layer for the density curve.
Let us store the list
hist(A)
top
.We can now see the contents of the list
p
.The
density
refers to the theoretical density function value. This can exceed 1, but the area under the density curve should be equal to 1. The width of each bar is easily determined by the difference between the breakpoints (breaks
) of the bars in the histogram. Thus, if we multiply the width of each bar of the histogram by thep$density
, and add the results, we should get a sum of 1.