In my data, there are ten millions of binary attributes, But only some of them are informative, most of them are zeros.
Format is like as following:
data attribute1 attribute2 attribute3 attribute4 .........
A 0 1 0 1 .........
B 1 0 1 0 .........
C 1 1 0 1 .........
D 1 1 0 0 .........
What is a smart way to cluster this? I know K-means clustering. But I don't think it's suitable in this case. Because the binary value makes distances less obvious. And it will suffer form the curse of high-dimensionality. Eeve if I cluster based on those few informative attribute, it's still to many attributes.
I think the decision tree is nice to cluster this data. But it's a Classification algorithm!
What can I do?