-->

Discrete Kolmogorov-Smirnov testing: getting wrong

2019-04-17 23:55发布

问题:

I am trying to use the dgof module from R, in Python 3 via rpy2.

I use it inside python as so:

# import rpy2's package module
import rpy2.robjects.packages as rpackages

# Import R's utility package
utils = rpackages.importr('utils')

# Select a mirror for R packages
utils.chooseCRANmirror(ind=1) # select the first mirror in the list

# R vector of strings
from rpy2.robjects.vectors import StrVector

# Install R package name: 'dgof' (discrete goodness-of-fit) is what we're interested in
if rpackages.isinstalled('dgof') is False:
    utils.install_packages(StrVector('dgof'))

# Import dgof
dgof = rpackages.importr('dgof')

Works a charm (i.e. I can import it, which is a big win in itself). Now as a test I wanted to reproduce the example result here, from the API documentation.

For clarity, in pure R, the example is (and to be clear, this function is NOT stats::ks.test(rep(1, 3), ecdf(1:3)) but native dgof):

ks.test(rep(1, 3), ecdf(1:3))

which results in a p-value of 0.07407 (to verify this, click on the green "Run this code" button in this link). Note that:

> ecdf(1:3)
Empirical CDF 
Call: ecdf(1:3)
 x[1:3] =      1,      2,      3
> rep(1,3)
[1] 1 1 1

In Python the reproduced example is:

import numpy as np
a = np.array([1,1,1])
b = np.arange(1,4)
dgof.ks_test(a,b)

But in the example, the p-value I find is 0.517551. The KS-statistic itself is correctly calculated. But why is the simulated p-value different? Again to see the output of the dgof example in the link, press Run this example and you'll see the numbers that I am referring to (reproduced above).