Reveal k-modes cluster features

2019-03-16 22:49发布

问题:

I'm performing a cluster analysis on categorical data, hence using k-modes approach.

My data is shaped as a preference survey: How do you like hair and eyes?

The respondent can pick up an answers from a fixed (multiple choice) set of 4 possibility.

I therefore get the dummies, apply k-modes, attach the clusters back to the initial df and then plot them in 2D with pca.

My code looks like:

import numpy as np
import pandas as pd
from kmodes import kmodes

df_dummy = pd.get_dummies(df)

#transform into numpy array
x = df_dummy.reset_index().values

km = kmodes.KModes(n_clusters=3, init='Huang', n_init=5, verbose=0)
clusters = km.fit_predict(x)
df_dummy['clusters'] = clusters


import matplotlib.pyplot as plt
from sklearn.decomposition import PCA
pca = PCA(2)

# Turn the dummified df into two columns with PCA
plot_columns = pca.fit_transform(df_dummy.ix[:,0:12])

# Plot based on the two dimensions, and shade by cluster label
plt.scatter(x=plot_columns[:,1], y=plot_columns[:,0], c=df_dummy["clusters"], s=30)
plt.show()

and I can visualize:

Now my problem is: Can somehow reveal the distinctive feature of each cluster? ie, what are the main characteristics (maybe blond hair and blue eyes) of the group of green dots in the scatterplot?

I get the clustering has happened, but I can't find a way to translate what the clustering actually means.

Should I play with the .labels_ object?

回答1:

Take a look at km.cluster_centroids_. This will give the mode of each variable for each cluster.