Clustering transactional data using PAM in R?

2019-09-03 04:02发布

问题:

I need to group sets of transactions in different groups. My data in a text file as this format:

T1  17  20  22  35  37  60  62    
T2  39  51  53  54  57  65  73    
T3  17  20  21  22  34  37  62    
T4  20  22  54  57  65  73  45    
T5  20  54  57  65  73  75  80    
T6  2   20  54  57  59  63  71    
T7  2   20  22  57  59  71  66    
T8  17  20  28  29  30  34  35    
T9  16  20  28  32  54  57  65    
T10 16  20  22  28  57  59  71    
-    
-

and so on, over 5000 lines. Each line represents one transaction.

What I did so far:

txIn<-read.transactions("data2.txt",format="basket",sep=" ") 
d<-dissimilarity(txIn,method="Jaccard")
 library("cluster")
 clustersA<-pam(d,k=100)
 txOut <- paste("txOu", ".txt") 
write.table(clustersA$clustering, file="txOu",sep=" ")

but the file stores the transaction# with its cluster like:

"x"
"1" 1
"2" 1
"3" 1
"4" 1
"5" 1
"6" 2
"7" 2
"8" 2
"9" 1
"10" 2
-
-

and I need to save it as, for example:

cluster 1:

T1  17  20  22  35  37  60  62    
T2  39  51  53  54  57  65  73    
T3  17  20  21  22  34  37  62    
T4  20  22  54  57  65  73  45    
T5  20  54  57  65  73  75  80

T9  16  20  28  32  54  57  65

cluster 2:

T6  2   20  54  57  59  63  71    
T7  2   20  22  57  59  71  66    
T8  17  20  28  29  30  34  35        
T10 16  20  22  28  57  59  71    
    -
    -

and so on, because I want to deal with each cluster individually.

Please I have searched a lot, I need any information, example, doc, any help.

回答1:

Are you sure you want to do clustering?

To me, it sounds like you might be more interested in frequent itemset mining.