可以将文章内容翻译成中文,广告屏蔽插件可能会导致该功能失效(如失效,请关闭广告屏蔽插件后再试):
问题:
Is there a package in Perl that allows you to compute the height of probability distribution at each given point. For example this can be done in R this way:
> dnorm(0, mean=4,sd=10)
> 0.03682701
Namely the probability of point x=0 falls into a normal distribution, with mean=4 and sd=10, is 0.0368.
I looked at Statistics::Distribution but it doesn't give that very
function to do it.
回答1:
Why not something along these lines (I am writing in R, but it could be done in perl with Statistics::Distribution):
dn <- function(x=0 # value
,mean=0 # mean
,sd=1 # sd
,sc=10000 ## scale the precision
) {
res <- (pnorm(x+1/sc, mean=mean, sd=sd)-pnorm(x, mean=mean, sd=sd))*sc
res
}
> dn(0,4,10,10000)
0.03682709
> dn(2.02,2,.24)
1.656498
[edit:1] I should mention that this approximation can get pretty horrible at the far tails. it might or might not matter depending on your application.
[edit:2] @foolishbrat Turned the code into a function. The result should always be positive. Perhaps you are forgetting that in the perl module you mention the function returns the upper probability 1-F, and R returns F?
[edit: 3] fixed a copy and paste error.
回答2:
dnorm(0, mean=4, sd=10) does not give you thr probability of such a point occurring. To quote Wikipedia on probability density function
In probability theory, a probability
density function (pdf)—often referred
to as a probability distribution
function1—or density, of a random
variable is a function that describes
the density of probability at each
point in the sample space. The
probability of a random variable
falling within a given set is given by
the integral of its density over the
set.
and the probability you mention is
R> pnorm(0, 4, 10)
[1] 0.3446
or a 34.46% chance of getting a value equal to or smaller than 0 from a N(4, 10) distribution.
As for your Perl question: If you know how to do it in R, but need it from Perl, maybe you need to write a Perl extension based on R's libRmath (provided in Debian by the package r-mathlib) to get those functions to Perl? This does not require the R interpreter.
Otherwise, you could try the GNU GSL or the Cephes libraries for access to these special functions.
回答3:
If you really want the density function, why not use it directly:
$pi = 3.141593;
$x = 2.02;
$mean = 2;
$sd = .24;
print 1/($sd * sqrt(2*$pi)) * exp(-($x-$mean)**2 / (2 * $sd**2));
It gives 1.65649768474891 about the same as dnorm in R.
回答4:
I don't think Jouni is quite right. This seems to give a reasonable version of the PDF (extract the middle of the loop if you just want a specific x-y point):
!/usr/bin/perl
use strict;
use Getopt::Std;
use POSIX qw(ceil floor);
# Usage
# Outputs normal density function given a mean and sd
# -s standard deviation
# -m mean
# -n normalization factor (multiply result by this amount), optional
my %para = ();
getopts('s:m:n:', \%para);
if (!exists ($para{'s'}) || !exists ($para{'m'})) {
die ("mean and standard deviation required");
}
my $norm = 1.0;
if (exists ($para{'n'})) {
$norm = $para{'n'};
}
my $sd = $para{'s'};
my $mean = $para{'m'};
my $start = floor($mean - ($sd * 5));
my $end = ceil($mean + ($sd * 5));
my $pi = 3.141593;
my $var = $sd**2;
for (my $x = $start; $x < $end; $x+=0.1) {
my $e = exp( -1 * (($x-$mean)**2) / (2*$var));
my $d = sqrt($var) * sqrt(2*$pi);
my $y = 1.0/$d*$e * $norm;
printf ("%5.5f %5.5f\n", $x, $y);
}
回答5:
As others have pointed out, you probably want the cumulative distribution function. This can be obtained via the error function (shifted by the mean and scaled by the standard deviation of your normal distribution), which exists in the standard math library and is made accessible in Perl by Math::Libm.
回答6:
Using Perl's Statistics::Distributions, you can achieve this with:
#!/usr/bin/perl
use strict; use warnings;
use Statistics::Distributions qw(uprob);
my $x = 0;
my $mean = 4;
my $stdev = 10;
print "Height of probablility distribution at point $x = "
. (1-uprob(($x-$mean)/$stdev))."\n";
Results with "Height of probablility distribution at point 0 = 0.34458"
回答7:
Here's how you can do the same thing you're doing with R in Perl using the Math::SymbolicX::Statistics::Distributions module from CPAN:
use strict; use warnings;
use Math::SymbolicX::Statistics::Distributions qw/normal_distribution/;
my $norm = normal_distribution(qw/mean sd/);
print $norm->value(mean => 4, sd => 10, x => 0), "\n";
# curry it with the parameter values
$norm->implement(mean => 4, sd => 10);
print $norm->value(x => 0),"\n"; # prints the same as above
The normal_distribution() function from that module is a generator for functions. $norm will be a Math::Symbolic (::Operator) object that you can modify. For example with implement, which, in the above example, replaces the two parameter variables with constants.
Note, however as Dirk pointed out, that you probably want the cumulative function of the normal distribution. Or more generally the integral in a certain range.
Unfortunately, Math::Symbolic can't do integration symbolically. Therefore, you'd have to resort to numerical integration with the likes of Math::Integral::Romberg. (Alternatively, search CPAN for an implementation of the error function.) This may be slow, but it's still easy to do. Add this to the above snippet:
use Math::Integral::Romberg 'integral';
my ($int_sub) = $norm->to_sub(); # compile to a faster Perl sub
print $int_sub->(0),"\n"; # same number as above
print "p=" . integral($int_sub, -100., 0) . "\n";
# -100 is an arbitrary, small number
This should give you the ~0.344578258389676 from Dirk's answer.