How can I compute the probability at a point given

2020-07-23 06:44发布

问题:

Is there a package in Perl that allows you to compute the height of probability distribution at each given point. For example this can be done in R this way:

> dnorm(0, mean=4,sd=10)
> 0.03682701

Namely the probability of point x=0 falls into a normal distribution, with mean=4 and sd=10, is 0.0368. I looked at Statistics::Distribution but it doesn't give that very function to do it.

回答1:

Why not something along these lines (I am writing in R, but it could be done in perl with Statistics::Distribution):

dn <- function(x=0 # value
               ,mean=0 # mean 
               ,sd=1 # sd
               ,sc=10000 ## scale the precision
               ) {
  res <- (pnorm(x+1/sc, mean=mean, sd=sd)-pnorm(x, mean=mean, sd=sd))*sc
  res
}
> dn(0,4,10,10000)
0.03682709
> dn(2.02,2,.24)
1.656498

[edit:1] I should mention that this approximation can get pretty horrible at the far tails. it might or might not matter depending on your application.

[edit:2] @foolishbrat Turned the code into a function. The result should always be positive. Perhaps you are forgetting that in the perl module you mention the function returns the upper probability 1-F, and R returns F?

[edit: 3] fixed a copy and paste error.



回答2:

dnorm(0, mean=4, sd=10) does not give you thr probability of such a point occurring. To quote Wikipedia on probability density function

In probability theory, a probability density function (pdf)—often referred to as a probability distribution function1—or density, of a random variable is a function that describes the density of probability at each point in the sample space. The probability of a random variable falling within a given set is given by the integral of its density over the set.

and the probability you mention is

R> pnorm(0, 4, 10)
[1] 0.3446

or a 34.46% chance of getting a value equal to or smaller than 0 from a N(4, 10) distribution.

As for your Perl question: If you know how to do it in R, but need it from Perl, maybe you need to write a Perl extension based on R's libRmath (provided in Debian by the package r-mathlib) to get those functions to Perl? This does not require the R interpreter.

Otherwise, you could try the GNU GSL or the Cephes libraries for access to these special functions.



回答3:

If you really want the density function, why not use it directly:

$pi = 3.141593;
$x = 2.02;
$mean = 2;
$sd = .24;
print 1/($sd * sqrt(2*$pi)) * exp(-($x-$mean)**2 / (2 * $sd**2));

It gives 1.65649768474891 about the same as dnorm in R.



回答4:

I don't think Jouni is quite right. This seems to give a reasonable version of the PDF (extract the middle of the loop if you just want a specific x-y point):

!/usr/bin/perl

use strict;
use Getopt::Std;
use POSIX qw(ceil floor);

# Usage
# Outputs normal density function given a mean and sd
# -s standard deviation
# -m mean
# -n normalization factor (multiply result by this amount), optional

my %para = ();
getopts('s:m:n:', \%para);
if (!exists ($para{'s'}) || !exists ($para{'m'})) {
   die ("mean and standard deviation required");
}

my $norm = 1.0;
if (exists ($para{'n'})) {
   $norm = $para{'n'};
}

my $sd = $para{'s'};
my $mean = $para{'m'};

my $start = floor($mean - ($sd * 5));
my $end = ceil($mean + ($sd * 5));

my $pi = 3.141593;

my $var = $sd**2;

for (my $x = $start; $x < $end; $x+=0.1) {
    my $e = exp( -1 * (($x-$mean)**2) / (2*$var));
    my $d = sqrt($var) * sqrt(2*$pi);
    my $y = 1.0/$d*$e * $norm;
    printf ("%5.5f %5.5f\n", $x, $y);
}


回答5:

As others have pointed out, you probably want the cumulative distribution function. This can be obtained via the error function (shifted by the mean and scaled by the standard deviation of your normal distribution), which exists in the standard math library and is made accessible in Perl by Math::Libm.



回答6:

Using Perl's Statistics::Distributions, you can achieve this with:

#!/usr/bin/perl

use strict; use warnings;
use Statistics::Distributions qw(uprob);

my $x       = 0;
my $mean    = 4;
my $stdev   = 10;

print "Height of probablility distribution at point $x = "
    . (1-uprob(($x-$mean)/$stdev))."\n";

Results with "Height of probablility distribution at point 0 = 0.34458"



回答7:

Here's how you can do the same thing you're doing with R in Perl using the Math::SymbolicX::Statistics::Distributions module from CPAN:

use strict; use warnings;

use Math::SymbolicX::Statistics::Distributions qw/normal_distribution/;

my $norm = normal_distribution(qw/mean sd/);
print $norm->value(mean => 4, sd => 10, x => 0), "\n";

# curry it with the parameter values
$norm->implement(mean => 4, sd => 10);
print $norm->value(x => 0),"\n"; # prints the same as above

The normal_distribution() function from that module is a generator for functions. $norm will be a Math::Symbolic (::Operator) object that you can modify. For example with implement, which, in the above example, replaces the two parameter variables with constants.

Note, however as Dirk pointed out, that you probably want the cumulative function of the normal distribution. Or more generally the integral in a certain range.

Unfortunately, Math::Symbolic can't do integration symbolically. Therefore, you'd have to resort to numerical integration with the likes of Math::Integral::Romberg. (Alternatively, search CPAN for an implementation of the error function.) This may be slow, but it's still easy to do. Add this to the above snippet:

use Math::Integral::Romberg 'integral';

my ($int_sub) = $norm->to_sub(); # compile to a faster Perl sub
print $int_sub->(0),"\n";  # same number as above

print "p=" . integral($int_sub, -100., 0) . "\n";
# -100 is an arbitrary, small number

This should give you the ~0.344578258389676 from Dirk's answer.