I have created a seaborn countplot
for multiple categorical variables of a dataframe but instead of count I want to have percentages?
What is the best option to use? Barplots? Can I use a query like the below one to get the barplots at once?
for i, col in enumerate(df_categorical.columns):
plt.figure(i)
sns.countplot(x=col,hue='Response',data=df_categorical)
this query gives me the countplot
for all variables at once
Thanks!
Data looks like this:
State Response Coverage Education Effective To Date EmploymentStatus Gender Location Code Marital Status Policy Type Policy Renew Offer Type Sales Channel Vehicle Class Vehicle Size
0 Washington No Basic Bachelor 2/24/11 Employed F Suburban Married Corporate Auto Corporate L3 Offer1 Agent Two-Door Car Medsize
1 Arizona No Extended Bachelor 1/31/11 Unemployed F Suburban Single Personal Auto Personal L3 Offer3 Agent Four-Door Car Medsize
2 Nevada No Premium Bachelor 2/19/11 Employed F Suburban Married Personal Auto Personal L3 Offer1 Agent Two-Door Car Medsize
3 California No Basic Bachelor 1/20/11 Unemployed M Suburban Married Corporate Auto Corporate L2 Offer1 Call Center SUV Medsize
4 Washington No Basic Bachelor 2/3/11 Employed M Rural Single Personal Auto Personal L1 Offer1 Agent Four-Door Car Medsize
Consider a groupby.transform
to calculate percentage column, then run barplot
with x for original value column and y for percent column.
Data (only converted two No to Yes responses to original posted data)
from io import StringIO
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
txt = '''
State Response Coverage Education "Effective To Date" EmploymentStatus Gender "Location Code" "Marital Status" "Policy Type" Policy "Renew Offer Type" "Sales Channel" "Vehicle Class" "Vehicle Size"
0 Washington No Basic Bachelor "2/24/11" Employed F Suburban Married "Corporate Auto" "Corporate L3" Offer1 Agent "Two-Door Car" Medsize
1 Arizona No Extended Bachelor "1/31/11" Unemployed F Suburban Single "Personal Auto" "Personal L3" Offer3 Agent "Four-Door Car" Medsize
2 Nevada Yes Premium Bachelor "2/19/11" Employed F Suburban Married "Personal Auto" "Personal L3" Offer1 Agent "Two-Door Car" Medsize
3 California No Basic Bachelor "1/20/11" Unemployed M Suburban Married "Corporate Auto" "Corporate L2" Offer1 "Call Center" SUV Medsize
4 Washington Yes Basic Bachelor "2/3/11" Employed M Rural Single "Personal Auto" "Personal L1" Offer1 Agent "Four-Door Car" Medsize'''
df_categorical = pd.read_table(StringIO(txt), sep="\s+")
Plot (single figure of multiple plots across two columns)
fig = plt.figure(figsize=(10,30))
for i, col in enumerate(df_categorical.columns):
# PERCENT COLUMN CALCULATION
df_categorical[col+'_pct'] = df_categorical.groupby(['Response', col])[col]\
.transform(lambda x: len(x)) / len(df_categorical)
plt.subplot(8, 2, i+1)
sns.barplot(x=col, y=col+'_pct', hue='Response', data=df_categorical)\
.set(xlabel=col, ylabel='Percent')
plt.tight_layout()
plt.show()
plt.clf()
plt.close('all')