SQL Query for Chi-SQUARE TEST [duplicate]

2020-06-30 04:35发布

问题:

I am trying to find the CHI-SQUARE TEST on the following set of data in the table. I am trying my this Query to find the CHI-SQUARE TEST:

 SELECT sessionnumber, sessioncount, timespent,
 (dim1.cnt * dim2.cnt * dim3.cnt)/(dimall.cnt*dimall.cnt) as expected
 FROM (SELECT sessionnumber, SUM(cast(cnt as bigint)) as cnt
 FROM d3
 GROUP BY sessionnumber) dim1 CROSS JOIN
 (SELECT sessioncount, SUM(cast(cnt as bigint)) as cnt
 FROM d3
 GROUP BY sessioncount) dim2 CROSS JOIN
 (SELECT timespent, SUM(cast(cnt as bigint)) as cnt
 FROM d3
 GROUP BY timespent) dim3 CROSS JOIN
 (SELECT SUM(cast(cnt as bigint)) as cnt FROM d3) dimall

The sample data is:

sessionnumber   sessioncount    timespent       cnt
1                  17               28          45
2                  22               8           30
3                  1                1           2
4                  1                1           2
5                  8               111          119
6                  8                65          73
7                  11               5           16
8                  1                1           2
9                  62               64          126
10                 6                42          48

But it gives me wrong output for the chi-square test values the output it gives is:

sessionnumber   sessioncount    timespent   expected
1                  23               1          0
2                  23               1          0
3                  23               1          0
4                  23               1          0
5                  23               1          0
6                  23               1          0
7                  23               1          0
8                  23               1          0
9                  23               1          0
10                 23               1          0

I have tried my best and searched alot about this problem. Please do me a favor and kindly fix the problem! Thanks in advance!

回答1:

integer math, cast dimall.cnt to a decimal or numeric or do the following

/(dimall.cnt* 1.00)* (dimall.cnt * 1.00)

another example to explain what actually happens

select 3/2  -- output = 1, integer math, result is an integer

select 3/2.00  -- output = 1.50


回答2:

Because you are already doing casts in your calculations, you might as well cast to float instead of bigint

 SELECT sessionnumber, sessioncount, timespent,
 (dim1.cnt * dim2.cnt * dim3.cnt)/(dimall.cnt*dimall.cnt) as expected
 FROM (SELECT sessionnumber, SUM(cast(cnt as float)) as cnt
 FROM d3
 GROUP BY sessionnumber) dim1 CROSS JOIN
 (SELECT sessioncount, SUM(cast(cnt as float)) as cnt
 FROM d3
 GROUP BY sessioncount) dim2 CROSS JOIN
 (SELECT timespent, SUM(cast(cnt as float)) as cnt
 FROM d3
 GROUP BY timespent) dim3 CROSS JOIN
 (SELECT SUM(cast(cnt as float)) as cnt FROM d3) dimall;

float has something like 16 digits of precision, so it should be adequate for counting any reasonable number of objects in the known universe.