-->

Data Mining Operation using SQL Query (Fuzzy Aprio

2019-02-16 21:24发布

问题:

So i have this Table :

Trans_ID    Name    Fuzzy_Value    Total_Item  
100          I1  0.33333333        3  
100          I2  0.33333333        3  
100          I5  0.33333333        3  
200          I2  0.5               2  
200          I5  0.5               2  
300          I2  0.5               2  
300          I3  0.5               2  
400          I1  0.33333333        3  
400          I2  0.33333333        3  
400          I4  0.33333333        3  
500          I1  0.5               2  
500          I3  0.5               2  
600          I2  0.5               2  
600          I3  0.5               2  
700          I1  0.5               2  
700          I3  0.5               2  
800          I1  0.25              4  
800          I2  0.25              4  
800          I3  0.25              4  
800          I5  0.25              4  
900          I1  0.33333333        3  
900          I2  0.33333333        3  
900          I3  0.33333333        3  
1000         I1  0.2               5  
1000         I2  0.2               5  
1000         I4  0.2               5  
1000         I6  0.2               5  
1000         I8  0.2               5  

And 2 Blank Table :

Table  ITEMSET

"ITEM_SET" "Support" 



Table Confidence

"ANTECEDENT" "CONSEQUENT" 

I need to find FUZZY value for each item that occur in each transaction:

I1 = Sum of (Fuzzy_Value from item I1 in trans 100 until 1000 which is trans: 100,400,500,700,800,900,1000)/Total Trans  
-> (.33333333+0.33333333+0.5+0.5+0.25+0.33333333+0.2)/10 = 0.244999999


I2 = Sum of (Fuzzy_Value from item I2 in trans 100 - 1000 which is trans:100,200,300,400,600,800,900,1000)/Total Trans  
-> (0.33333333+0.5+0.5+0.33333333+0.5+0.25+0.33333333)/10 = 0.274999999


I3 -> 0.258333333  
I4 -> 0.103333333  
I5 -> 0.058333333    
I6 -> 0.02    
I8 -> 0.02    

For EX: i use minimum Support 10% -> 0.1
I need to remove I5,I6,I8 since it's value < 0.1 => prune step

then store

I1=0.244999999, I2=0.274999999, I3=0.258333333,I4=0.103333333  on new table 'ITEMSET' 

2 COMBINATION

NOTE: this is the basic 1st step after this most likely need to use repeat or recursive, since the process will keep go on until no other item combination is possible
then from what's left i need to find K+1 itemset (which is 2 combination itemset) => join step

{I1,I2} =Sum of (Fuzzy_Value from item I1 + I2 in trans 100 - 1000 which is trans:100,400,800,900,1000)/Total Trans 
->(0.666666667+0.666666667+0.5+0.666666667+0.4)/9 = 0.29

*do the same for the rest*
{I1,I3} =(1+1+0.5+0.666666667)/9 = 0.316666667
{I1,I4} =(0.666666667+0.4)/9 = 0.106666667
{I2,I3} =(1+1+0.5+0.666666667)/9 = 0.316666667
{I2,I4} =(1+0.666666667+0.4)/9 =0.206666667
{I3,I4} =0  

Then Do another Prune Step removing less than 0.1 value which is {I3,I4}

Store {I1,I2} = 0.29, {I1,I3} = 0.316666667, {I1,I4} =0.106666667, {I2,I3} = 0.316666667, {I2,I4} = 0.206666667  AT "ITEMSET" TABLE 

3 COMBINATION

After that Do another JOIN STEP combining itemset that pass pruning

{I1,I2,I3} = Sum of (Fuzzy_Value from item I1 + I2 +I3 in trans 100 - 1000 which is trans:800,900)/Total Trans  
-> 0.75+1 = 0.175  
**Same for the rest**  
{I1,I2,I4} = 1+0.6 = 0.16  
{I2,I3,I4} = 0  

Do another Prune Step removing less than 0.1 value which is {I1,I3,I4}

Store {I1,I2,I3} = 0.176 AND {I1,I2,I4} = 0,16 AT "ITEMSET" TABLE  

4 COMBINATION

Combine itemset that pass pruning K+4 (4 combination)

{I1,I2,I3,I4} = 0

**since no transaction containing this item

after process stop since there's no possible combination left


at this point ITEMSET database have :

ITEM_SET           Support  
{I1}               0.244999999
{I2}               0.274999999     
{I3}               0.258333333    
{I4}               0.103333333  
{I1,I2}            0.29    
{I1,I3}            0.316666667  
{I1,I4}            0.106666667  
{I2,I3}            0.316666667  
{I2,I4}            0.206666667  
{I1,I2,I3}         0.176  
{I1,I2,I4}         0,16  

how do i code that in sql? thank you very much

*note: you can add another table as needed

回答1:

Step 1:

CREATE TABLE ITEMSET
SELECT Name, SUM(Fuzzy_Value)/COUNT(*) Fuzzy_Value
FROM trans
GROUP BY ID
HAVING ROUND(SUM(Fuzzy_Value), 1) >= 0.1

Note the ROUND() function - it's important, because you have values like .33333 that don't sum in a happy way.

Step 2:

ALTER TABLE ITEMSET ADD INDEX (Name)

SELECT a.Name Name1, b.Name Name2, SUM(Fuzzy_Value)/COUNT(*) Fuzzy_Value
FROM ITEMSET a JOIN ITEMSET b ON (a.Name != b.Name)
GROUP BY a.Name, b.Name
HAVING ROUND(SUM(Fuzzy_Value), 1) >= 0.1

Opps: I just noticed that you asked this half a year ago, so I guess there is no point in continuing. If you still need this answer leave a comment.