How can I efficiently calculate the binomial cumul

Let's say that I know the probability of a "success" is P. I run the test N times, and I see S successes. The test is akin to tossing an unevenly weighted coin (perhaps heads is a success, tails is a failure).

I want to know the approximate probability of seeing either S successes, or a number of successes less likely than S successes.

So for example, if P is 0.3, N is 100, and I get 20 successes, I'm looking for the probability of getting 20 or fewer successes.

If, on the other hadn, P is 0.3, N is 100, and I get 40 successes, I'm looking for the probability of getting 40 our more successes.

I'm aware that this problem relates to finding the area under a binomial curve, however:

My math-fu is not up to the task of translating this knowledge into efficient code
While I understand a binomial curve would give an exact result, I get the impression that it would be inherently inefficient. A fast method to calculate an approximate result would suffice.

I should stress that this computation has to be fast, and should ideally be determinable with standard 64 or 128 bit floating point computation.

I'm looking for a function that takes P, S, and N - and returns a probability. As I'm more familiar with code than mathematical notation, I'd prefer that any answers employ pseudo-code or code.

标签： algorithm math probability binomial-cdf

10条回答

姐就是有狂的资本

2楼-- · 2020-02-02 07:09

The DCDFLIB Project has C# functions (wrappers around C code) to evaluate many CDF functions, including the binomial distribution. You can find the original C and FORTRAN code here. This code is well tested and accurate.

If you want to write your own code to avoid being dependent on an external library, you could use the normal approximation to the binomial mentioned in other answers. Here are some notes on how good the approximation is under various circumstances. If you go that route and need code to compute the normal CDF, here's Python code for doing that. It's only about a dozen lines of code and could easily be ported to any other language. But if you want high accuracy and efficient code, you're better off using third party code like DCDFLIB. Several man-years went into producing that library.

0人赞添加讨论(0) 举报

在下西门庆

3楼-- · 2020-02-02 07:14

I can't totally vouch for the efficiency, but Scipy has a module for this

from scipy.stats.distributions import binom
binom.cdf(successes, attempts, chance_of_success_per_attempt)

0人赞添加讨论(0) 举报

We Are One

4楼-- · 2020-02-02 07:16

I think you want to evaluate the incomplete beta function.

There's a nice implementation using a continued fraction representation in "Numerical Recipes In C", chapter 6: 'Special Functions'.

0人赞添加讨论(0) 举报

淡お忘

5楼-- · 2020-02-02 07:16

If you are using Python, no need to code it yourself. Scipy got you covered:

from scipy.stats import binom
# probability that you get 20 or less successes out of 100, when p=0.3
binom.cdf(20, 100, 0.3)
>>> 0.016462853241869434

# probability that you get exactly 20 successes out of 100, when p=0.3
binom.pmf(20, 100, 0.3)
>>> 0.0075756449257260777

0人赞添加讨论(0) 举报

Animai°情兽

6楼-- · 2020-02-02 07:18

Try this one, used in GMP. Another reference is this.

0人赞添加讨论(0) 举报

冷血范

7楼-- · 2020-02-02 07:21

From the portion of your question "getting at least S heads" you want the cummulative binomial distribution function. See http://en.wikipedia.org/wiki/Binomial_distribution for the equation, which is described as being in terms of the "regularized incomplete beta function" (as already answered). If you just want to calculate the answer without having to implement the entire solution yourself, the GNU Scientific Library provides the function: gsl_cdf_binomial_P and gsl_cdf_binomial_Q.

0人赞添加讨论(0) 举报

1 2 下一页

How can I efficiently calculate the binomial cumul

采纳回答

编辑标签

举报内容

检举类型

检举原因

检举说明(必填)

打开微信“扫一扫”，打开网页后点击屏幕右上角分享按钮

付费偷看金额在0.1-10元之间