integrate() gives totally wrong number

2020-08-25 09:06发布

integrate() gives horribly wrong answer:

integrate(function (x) dnorm(x, -5, 0.07), -Inf, Inf, subdivisions = 10000L)
# 2.127372e-23 with absolute error < 3.8e-23

The return value should obviously be 1 (normal distrubution integrates to 1), but integrate() returns ridiculously small number, with wrong error reporting, and no warning...

Any ideas?

This seems the default integrate() is horribly buggy... and I just found this by chance! Is there any reliable R package to compute numerical integration?

EDIT: I tried package pracma and I see the same problem! :

require(pracma)
integral(function (x) dnorm(x, -5, 0.07), -Inf, Inf)
# For infinite domains Gauss integration is applied!
# [1] 0

EDIT: Hmm... digging deeper, it seems that he has trouble to find the very narrow domain for the function which is numerically > 0. When I set the limits to certain (very close to 0, 1) quantiles, it starts to work:

integral(function (x) dnorm(x, -5, 0.07), qnorm(1e-10, -5, 0.07), qnorm(1 - 1e-10, -5, 0.07))

But anyway, this is quite horrible gotcha... wonder if there is any remedy for this.

标签: r
4条回答
做自己的国王
2楼-- · 2020-08-25 09:26

From the online documentation: "Like all numerical integration routines, these evaluate the function on a finite set of points. If the function is approximately constant (in particular, zero) over nearly all its range it is possible that the result and error estimate may be seriously wrong."

I take this to mean "caveat emptor". I notice that in your example, the absolute error is greater than value of the integral. Given that you know f(x) > 0 for all x, at least it's giving you the chance to spot that something has gone wrong. It's down to you to take the opportunity.

integrate( function(x) dnorm(x, -5, 0.07), -20, 10, subdivisions=1000L) 

Gives

1 with absolute error < 9.8e-07

The warning in the online doc says to me that, given your apparent definition of buggy, the answer to your question is "no, there is no reliable numerical intergration method. Not in R or any other language". No numerical integration technique should be used blindly. The user needs to check their inputs are sensible and the output is reasonable. It's no good believing an answer just because the computer gave it to you.

See also this post.

查看更多
冷血范
3楼-- · 2020-08-25 09:37

Expanding a little further on @r2evan's and @Limey's comments:

@Limey: for very general problems like this, there is simply no way to guarantee a generic solution.

One way to solve such problem is to use more knowledge of the properties of the integrand (@r2evans's answer); the answer referenced by @Limey goes into detail for a different problem.

One "gotcha" that you may not have thought of is that trying out a bunch of generic methods, tuning settings, etc. may mislead you into concluding that some settings/methods are generically better than the first one you tried that failed to get the right answer. (Methods that work may work better because they're generically better, but trying them on one example doesn't prove it!)

As an example, the description of pcubature() (in ?cubature::pcubature says

This algorithm is often superior to h-adaptive integration for smooth integrands in a few (<=3) dimensions, but is a poor choice in higher dimensions or for non-smooth integrands.

However, recall that pcubature() happens to fail for your example, which is a smooth low-dimensional case - exactly where pcubature() is supposed to perform better - which suggests that it may be just luck that hcubature() works and pcubature() doesn't in this case.

An illustration of how sensitive the results can be to parameters (lower/upper limits in this case):

library(emdbook)
cc <- curve3d(integrate( dnorm, mean=-5, sd=0.07,
        lower=x, upper=y, subdivisions=1000L)$value,
       xlim=c(-30,-10), ylim=c(0,30), n = c(61, 61),
       sys3d="image", col=c("black", "white"),
    xlab="lower", ylab="upper")

White squares are successful (integral=1), black squares are bad (integral=0).

enter image description here

查看更多
ら.Afraid
4楼-- · 2020-08-25 09:45

Try package cubature.

library(cubature)

hcubature(function (x) dnorm(x, -5, 0.07), -Inf, Inf)
#$integral
#[1] 1
#
#$error
#[1] 9.963875e-06
#
#$functionEvaluations
#[1] 405
#
#$returnCode
#[1] 0

Note that function pcubature in the same package also returns 0.

From vignette("cubature"), section Introduction. My emphasis.

This R cubature package exposes both the hcubature and pcubature routines of the underlying C cubature library, including the vectorized interfaces.

Per the documentation, use of pcubature is advisable only for smooth integrands in dimensions up to three at most. In fact, the pcubature routines perform significantly worse than the vectorized hcubature in inappropriate cases. So when in doubt, you are better off using hcubature.

Since in this case the integrand is the normal density, a smooth and 1-dimensional function, there would be reasons to prefer pcubature. But it doesn't give the right result. The vignette concludes the following.

  1. Vectorized hcubature seems to be a good starting point.

  2. For smooth integrands in low dimensions (≤3), pcubature might be worth trying out. Experiment before using in a production package.

查看更多
Fickle 薄情
5楼-- · 2020-08-25 09:50

Interesting workaround: not too surprisingly, integrate does well when the values sampled (on (-Inf,Inf), no less) are closer to the "center" of the data. You can reduce this by using your function but hinting at a center:

Without adjustment:

t(sapply(-10:10, function(i) integrate(function (x) dnorm(x, i, 0.07), -Inf, Inf, subdivisions = 10000L)))
#       value        abs.error    subdivisions message call      
#  [1,] 0            0            1            "OK"    Expression
#  [2,] 1            4.611403e-05 10           "OK"    Expression
#  [3,] 6.619713e-19 1.212066e-18 2            "OK"    Expression
#  [4,] 7.344551e-71 0            2            "OK"    Expression
#  [5,] 3.389557e-06 6.086176e-06 3            "OK"    Expression
#  [6,] 2.127372e-23 3.849798e-23 2            "OK"    Expression
#  [7,] 1            3.483439e-05 8            "OK"    Expression
#  [8,] 1            6.338078e-07 11           "OK"    Expression
#  [9,] 1            3.408389e-06 7            "OK"    Expression
# [10,] 1            6.414833e-07 8            "OK"    Expression
# [11,] 1            7.578907e-06 3            "OK"    Expression
# [12,] 1            6.414833e-07 8            "OK"    Expression
# [13,] 1            3.408389e-06 7            "OK"    Expression
# [14,] 1            6.338078e-07 11           "OK"    Expression
# [15,] 1            3.483439e-05 8            "OK"    Expression
# [16,] 2.127372e-23 3.849798e-23 2            "OK"    Expression
# [17,] 3.389557e-06 6.086176e-06 3            "OK"    Expression
# [18,] 7.344551e-71 0            2            "OK"    Expression
# [19,] 6.619713e-19 1.212066e-18 2            "OK"    Expression
# [20,] 1            4.611403e-05 10           "OK"    Expression
# [21,] 0            0            1            "OK"    Expression

If we add a "centering" hint, though, we get more consistent results:

t(sapply(-10:10, function(i) integrate(function (x, offset) dnorm(x + offset, i, 0.07), -Inf, Inf, subdivisions = 10000L, offset = i)))
#       value abs.error    subdivisions message call      
#  [1,] 1     7.578907e-06 3            "OK"    Expression
#  [2,] 1     7.578907e-06 3            "OK"    Expression
#  [3,] 1     7.578907e-06 3            "OK"    Expression
#  [4,] 1     7.578907e-06 3            "OK"    Expression
#  [5,] 1     7.578907e-06 3            "OK"    Expression
#  [6,] 1     7.578907e-06 3            "OK"    Expression
#  [7,] 1     7.578907e-06 3            "OK"    Expression
#  [8,] 1     7.578907e-06 3            "OK"    Expression
#  [9,] 1     7.578907e-06 3            "OK"    Expression
# [10,] 1     7.578907e-06 3            "OK"    Expression
# [11,] 1     7.578907e-06 3            "OK"    Expression
# [12,] 1     7.578907e-06 3            "OK"    Expression
# [13,] 1     7.578907e-06 3            "OK"    Expression
# [14,] 1     7.578907e-06 3            "OK"    Expression
# [15,] 1     7.578907e-06 3            "OK"    Expression
# [16,] 1     7.578907e-06 3            "OK"    Expression
# [17,] 1     7.578907e-06 3            "OK"    Expression
# [18,] 1     7.578907e-06 3            "OK"    Expression
# [19,] 1     7.578907e-06 3            "OK"    Expression
# [20,] 1     7.578907e-06 3            "OK"    Expression
# [21,] 1     7.578907e-06 3            "OK"    Expression

I recognize this is mitigation for heuristics, presumes knowing something about your distribution before integration, and is not a perfect "generic" solution. Just offering another perspective.

查看更多
登录 后发表回答