Let's say that we have two samples data1
and data2
with their respective weights weight1
and weight2
and that we want to calculate the Kolmogorov-Smirnov statistic between the two weighted samples.
The way we do that in python follows:
def ks_w(data1,data2,wei1,wei2):
ix1=np.argsort(data1)
ix2=np.argsort(data2)
wei1=wei1[ix1]
wei2=wei2[ix2]
data1=data1[ix1]
data2=data2[ix2]
d=0.
fn1=0.
fn2=0.
j1=0
j2=0
j1w=0.
j2w=0.
while(j1<len(data1))&(j2<len(data2)):
d1=data1[j1]
d2=data2[j2]
w1=wei1[j1]
w2=wei2[j2]
if d1<=d2:
j1+=1
j1w+=w1
fn1=(j1w)/sum(wei1)
if d2<=d1:
j2+=1
j2w+=w2
fn2=(j2w)/sum(wei2)
if abs(fn2-fn1)>d:
d=abs(fn2-fn1)
return d
where we just modify to our purpose the classical two-sample KS statistic as implemented in Press, Flannery, Teukolsky, Vetterling - Numerical Recipes in C - Cambridge University Press - 1992 - pag.626.
Our questions are:
- is anybody aware of any other way to do it?
- is there any library in python/R/* that performs it?
- what about the test? Does it exist or should we use a reshuffling procedure in order to evaluate the statistic?
Studying the
scipy.stats.ks_2samp
code we were able to find a more efficient python solution. We share below in case anyone is interested:To evaluate the performance we performed the following test:
ks_w2(ds1, ds2, we1, we2)
took 11.7ms on our machine, whileks_w(ds1, ds2, we1, we2)
took 1.43s