integrate() gives horribly wrong answer:
integrate(function (x) dnorm(x, -5, 0.07), -Inf, Inf, subdivisions = 10000L)
# 2.127372e-23 with absolute error < 3.8e-23
The return value should obviously be 1 (normal distrubution integrates to 1), but integrate() returns ridiculously small number, with wrong error reporting, and no warning...
Any ideas?
This seems the default integrate()
is horribly buggy... and I just found this by chance! Is there any reliable R package to compute numerical integration?
EDIT: I tried package pracma
and I see the same problem! :
require(pracma)
integral(function (x) dnorm(x, -5, 0.07), -Inf, Inf)
# For infinite domains Gauss integration is applied!
# [1] 0
EDIT: Hmm... digging deeper, it seems that he has trouble to find the very narrow domain for the function which is numerically > 0. When I set the limits to certain (very close to 0, 1) quantiles, it starts to work:
integral(function (x) dnorm(x, -5, 0.07), qnorm(1e-10, -5, 0.07), qnorm(1 - 1e-10, -5, 0.07))
But anyway, this is quite horrible gotcha... wonder if there is any remedy for this.
From the online documentation: "Like all numerical integration routines, these evaluate the function on a finite set of points. If the function is approximately constant (in particular, zero) over nearly all its range it is possible that the result and error estimate may be seriously wrong."
I take this to mean "caveat emptor". I notice that in your example, the absolute error is greater than value of the integral. Given that you know f(x) > 0 for all x, at least it's giving you the chance to spot that something has gone wrong. It's down to you to take the opportunity.
Gives
The warning in the online doc says to me that, given your apparent definition of buggy, the answer to your question is "no, there is no reliable numerical intergration method. Not in R or any other language". No numerical integration technique should be used blindly. The user needs to check their inputs are sensible and the output is reasonable. It's no good believing an answer just because the computer gave it to you.
See also this post.
Expanding a little further on @r2evan's and @Limey's comments:
@Limey: for very general problems like this, there is simply no way to guarantee a generic solution.
One way to solve such problem is to use more knowledge of the properties of the integrand (@r2evans's answer); the answer referenced by @Limey goes into detail for a different problem.
One "gotcha" that you may not have thought of is that trying out a bunch of generic methods, tuning settings, etc. may mislead you into concluding that some settings/methods are generically better than the first one you tried that failed to get the right answer. (Methods that work may work better because they're generically better, but trying them on one example doesn't prove it!)
As an example, the description of
pcubature()
(in?cubature::pcubature
saysHowever, recall that
pcubature()
happens to fail for your example, which is a smooth low-dimensional case - exactly wherepcubature()
is supposed to perform better - which suggests that it may be just luck thathcubature()
works andpcubature()
doesn't in this case.An illustration of how sensitive the results can be to parameters (lower/upper limits in this case):
White squares are successful (integral=1), black squares are bad (integral=0).
Try package
cubature
.Note that function
pcubature
in the same package also returns 0.From
vignette("cubature")
, section Introduction. My emphasis.Since in this case the integrand is the normal density, a smooth and 1-dimensional function, there would be reasons to prefer
pcubature
. But it doesn't give the right result. The vignette concludes the following.Interesting workaround: not too surprisingly,
integrate
does well when the values sampled (on(-Inf,Inf)
, no less) are closer to the "center" of the data. You can reduce this by using your function but hinting at a center:Without adjustment:
If we add a "centering" hint, though, we get more consistent results:
I recognize this is mitigation for heuristics, presumes knowing something about your distribution before integration, and is not a perfect "generic" solution. Just offering another perspective.