In sklearn there is one agglomerative clustering algorithm implemented, the ward method minimizing variance. Usually sklearn is documented with lots of nice usage examples, but I couldn't find examples of how to use this function.
Basically my problem is to draw a dendrogram according to the clustering of my data, but I don't understand the output from the function. The documentation says that it returns the children, the number of components, the number of leaves and the parents of each node.
Yet for my data samples, the results don't give any meaning. For a (32,542) matrix that has been clustered with a connectivity matrix this is the output:
>>> wt = ward_tree(mymat, connectivity=connectivity, n_clusters=2)
>>> mymat.shape
(32, 542)
>>> wt
(array([[16, 0],
[17, 1],
[18, 2],
[19, 3],
[20, 4],
[21, 5],
[22, 6],
[23, 7],
[24, 8],
[25, 9],
[26, 10],
[27, 11],
[28, 12],
[29, 13],
[30, 14],
[31, 15],
[34, 33],
[47, 46],
[41, 40],
[36, 35],
[45, 44],
[48, 32],
[50, 42],
[38, 37],
[52, 43],
[54, 39],
[53, 51],
[58, 55],
[56, 49],
[60, 57]]), 1, 32, array([32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 32,
33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 53, 48,
48, 51, 51, 55, 55, 57, 50, 50, 54, 56, 52, 52, 49, 49, 53, 60, 54,
58, 56, 58, 57, 59, 60, 61, 59, 59, 61, 61]))
In this case I asked for two clusters, with 32 vectors containing features. But how are the two clusters visible in the data? Where are they? And what do the children really mean here, how can the children be higher numbers than the total number of samples?