-->

Why can't I get a p-value smaller than 2.2e-16

2019-02-01 18:31发布

问题:

I've found this issue with t-tests and chi-squared in R but I assume this issue applies generally to other tests. If I do:

a <- 1:10
b <- 100:110
t.test(a,b) 

I get: t = -64.6472, df = 18.998, p-value < 2.2e-16. I know from the comments that 2.2e-16 is the value of .Machine$double.eps - the smallest floating point number such that 1 + x != 1, but of course R can represent numbers much smaller than that. I know also from the R FAQ that R has to round floats to 53 binary digits accuracy: R FAQ.

A few questions: (1) am I correct in reading that as 53 binary digits of precision or are values in R < .Machine$double.eps not calculated accurately? (2) Why, when doing such calculations does R not provide a means to display a smaller value for the p-value, even with some loss of precision? (3) Is there a way to display a smaller p-value, even if I lose some precision? For a single test 2 decimal significant figures would be fine, for values I am going to Bonferroni correct I'll need more. When I say "lose some precision" I think < 53 binary digits, but (4) am I completely mistaken and any p-value < .Machine$double.eps is wildly inaccurate? (5) Is R just being honest and other stats packages are not?

In my field very small p-values are the norm, some examples: http://www.ncbi.nlm.nih.gov/pubmed/20154341, http://www.plosgenetics.org/article/info%3Adoi%2F10.1371%2Fjournal.pgen.1002215 and this is why I want to represent such small p-values.

Thanks for your help, sorry for such a tortuous question.

回答1:

Try something like this t.test(a,b)$p.value see if that gives you the accuracy you need. I believe it has more to do with the printing of the result than it does the actual stored computer value which should have the necessary precision.



回答2:

I'm puzzled by several things in the exchange of answers and comments here.

First of all, when I try the OP's original example I don't get a p value as small as the ones that are being debated here (several different 2.13.x versions and R-devel):

a <- 1:10
b <- 10:20
t.test(a,b)
## data:  a and b 
## t = -6.862, df = 18.998, p-value = 1.513e-06

Second, when I make the difference between groups much bigger, I do in fact get the results suggested by @eWizardII:

a <- 1:10
b <- 110:120
(t1 <- t.test(a,b))
# data:  a and b 
# t = -79.0935, df = 18.998, p-value < 2.2e-16
#
> t1$p.value
[1] 2.138461e-25

The behavior of the printed output in t.test is driven by its call to stats:::print.htest (which is also called by other statistical testing functions such as chisq.test, as noted by the OP), which in turn calls format.pval, which presents p values less than its value of eps (which is .Machine$double.eps by default) as < eps. I'm surprised to find myself disagreeing with such generally astute commenters ...

Finally, although it seems silly to worry about the precise value of a very small p value, the OP is correct that these values are often used as indices of strength of evidence in the bioinformatics literature -- for example, one might test 100,000 candidate genes and look at the distribution of resulting p values (search for "volcano plot" for one example of this sort of procedure).



回答3:

Two questions:

1) What possible difference in statistical implication would there be between p-values of 1e-16 and 1e-32? If you truly can justify it then using the logged values is the way to go.

2) Why do you use Wikipedia when your interest in in the numerical accuracy of R?

The R-FAQ says "Other [meaning non-integer] numbers have to be rounded to (typically) 53 binary digits accuracy." 16 digits is about the limit. This is how to get the limits of accuracy when at the console:

> .Machine$double.eps
[1] 2.220446e-16

That number is effectively zero when interpreted on a range of [0,1]



回答4:

The Wikipedia page you linked to was for the Decimal64 type which R does not use – it uses standard-issue doubles.

First, some definitions from the .Machine help page.

double.eps: the smallest positive floating-point number ‘x’ such that ‘1 + x != 1’. ... Normally ‘2.220446e-16’.

double.xmin: the smallest non-zero normalized floating-point number ... Normally ‘2.225074e-308’.

So you can represent numbers smaller than 2.2e-16, but their accuracy is dimished, and it causes problems with calculations. Try some examples with numbers close to the smallest representable value.

2e-350 - 1e-350
sqrt(1e-350)

You mentioned in a comment that you wanted to do bonferroni corrections. Rather than rolling your own code for this, I suggest that you use p.adjust(your_p_value, method = "bonferroni") instead. pairwise.t.test uses this.



回答5:

Some R packages solve this issue. The best way is through package pspearman.

source("http://www.bioconductor.org/biocLite.R")
biocLite("pspearman")
library("pspearman")
a=c(1:110,110)
b=1:111
out <- spearman.test(a, b, alternative = "greater", approximation="t-distribution")
out$p.value

[1] 3.819961e-294



回答6:

Recently had same problem. Fellow statistician recommends:

A <- cor.test(…)
p <- 2* pt(A$statistic,  df = A$parameter, lower.tail=FALSE)