Seaborn Countplot : Display only n most frequent c

2019-08-20 10:14发布

问题:

I have a python array listing all occurences of string labels. Let's call it labels_array. Using seaborn as sns I d like to show a countplot of this array :

sns.countplot(labels_array) This works, but as they are too many different labels in my array, the outpout doesnt look good.

Is there a way to display only the n most frequent labels.

回答1:

Although countplot should in principle know the counts and hence allow to show only part of them, this is not the case. Therefore, the use of countplot may not make too much sense here.

Instead just use a normal pandas plot. E.g. to show the 5 most frequent items in the list,

pandas.Series(labels_array).value_counts()[:5].plot(kind="bar")

Complete example:

import string
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

l = list(string.ascii_lowercase)
n = np.random.rand(len(l))
a = np.random.choice(l, p=n/n.sum(),size=400)

s = pd.Series(a)
s.value_counts()[:5].plot(kind="bar")

plt.show()


回答2:

I came across the same problem (and this question) and found that this question has already been answered.

The countplot function has the parameter order where you can specify for which values you want to plot the counts. The most often occurred values can be obtained, as previously stated, with the value_counts function.

See: limit the number of groups shown in seaborn countplot?