Decimal points - Probability value of 0 in Languag

2019-05-28 22:13发布

问题:

How to treat p value in R ?

I am expecting very low p values like:

1.00E-80

I need to -log10

-log10(1.00E-80)

-log10(0) is Inf, but Inf at sense of rounding too.

But is seems that after 1.00E-308, R yields 0.

1/10^308  
[1] 1e-308

 1/10^309 
[1] 0

Is the accuracy of p-value display with lm function the same as the cutoff point, 1e-308, or it is just designed such that we need a cutoff point and I need to consider a different cutoff point - such as 1e-100 (for example) to replace 0 with <1e-100.

回答1:

There are a variety of possible answers -- which one is most useful depends on the context:

  • R is indeed incapable under ordinary circumstances of storing floating-point values closer to zero than .Machine$double.xmin, which varies by platform but is typically (as you discovered) on the order of 1e-308. If you really need to work with numbers this small and can't find a way to work on the log scale directly, you need to search Stack Overflow or the R wiki for methods for dealing with arbitrary/extended precision values (but you probably should try to work on the log scale -- it will be much less of a hassle)
  • in many circumstances R actually computes p values on the (natural) log scale internally, and can if requested return the log values rather than exponentiating them before giving the answer. For example, dnorm(-100,log=TRUE) gives -5000.919. You can convert directly to the log10 scale (without exponentiating and then using log10) by dividing by log(10): dnorm(-100,log=TRUE)/log(10)=-2171, which would be too small to represent in floating point. For the p*** (cumulative distribution function) functions, use log.p=TRUE rather than log=TRUE. (This particular point depends heavily on your particular context. Even if you are not using built-in R functions you may be able to find a way to extract results on the log scale.)
  • in some cases R presents p-value results as being <2.2e-16 even when a more precise value is known: (t1 <- t.test(rnorm(10,100),rnorm(10,80)))

prints

....
t = 56.2902, df = 17.904, p-value < 2.2e-16

but you can still extract the precise p-value from the result

> t1$p.value
[1] 1.856174e-18

(in many cases this behaviour is controlled by the format.pval() function)

An illustration of how all this would work with lm:

d <- data.frame(x=rep(1:5,each=10))
set.seed(101)
d$y <- rnorm(50,mean=d$x,sd=0.0001)
lm1 <- lm(y~x,data=d)

summary(lm1) prints the p-value of the slope as <2.2e-16, but if we use coef(summary(lm1)) (which does not use the p-value formatting), we can see that the value is 9.690173e-203.

A more extreme case:

set.seed(101); d$y <- rnorm(50,mean=d$x,sd=1e-7)
lm2 <- lm(y~x,data=d)
coef(summary(lm2))

shows that the p-value has actually underflowed to zero. However, we can still get an answer on the log scale:

tval <- coef(summary(lm2))["x","t value"]
2*pt(abs(tval),df=48,lower.tail=FALSE,log.p=TRUE)/log(10)

gives -692.62 (you can check this approach with the previous example where the p-value doesn't overflow and see that you get the same answer as printed in the summary).



回答2:

Small numbers are generally hard to deal with.

The limit in R for infinite is caused by the use of double precision floating point :

?double All R platforms are required to work with values conforming to the IEC 60559 (also known as IEEE 754) standard. This basically works with a precision of 53 bits, and represents to that precision a range of absolute values from about 2e-308 to 2e+308.

http://en.wikipedia.org/wiki/Double_precision_floating-point_format

You may find the Rmpfr package helpful here as it allows you to create multiple precision numbers.

install.packages("Rmpfr")
require(Rmpfr)

log(mpfr(1/10^309, precBits=500))