How to normalize a seaborn countplot with multiple

2019-08-02 07:00发布

问题:

I have created a seaborn countplot for multiple categorical variables of a dataframe but instead of count I want to have percentages?

What is the best option to use? Barplots? Can I use a query like the below one to get the barplots at once?

for i, col in enumerate(df_categorical.columns):
   plt.figure(i)
   sns.countplot(x=col,hue='Response',data=df_categorical) 

this query gives me the countplot for all variables at once

Thanks!

Data looks like this:

    State           Response     Coverage   Education   Effective To Date   EmploymentStatus       Gender   Location Code   Marital Status  Policy Type Policy    Renew Offer Type  Sales Channel   Vehicle Class   Vehicle Size    
0   Washington  No  Basic   Bachelor    2/24/11 Employed    F   Suburban    Married Corporate Auto  Corporate L3    Offer1  Agent   Two-Door Car    Medsize  
1   Arizona     No  Extended    Bachelor    1/31/11 Unemployed  F   Suburban    Single  Personal Auto   Personal L3 Offer3  Agent   Four-Door Car   Medsize
2   Nevada      No  Premium Bachelor    2/19/11 Employed    F   Suburban    Married Personal Auto   Personal L3 Offer1  Agent   Two-Door Car    Medsize
3   California  No  Basic   Bachelor    1/20/11 Unemployed  M   Suburban    Married Corporate Auto  Corporate L2    Offer1  Call Center SUV Medsize
4   Washington  No  Basic   Bachelor    2/3/11  Employed    M   Rural   Single  Personal Auto   Personal L1 Offer1  Agent   Four-Door Car   Medsize

回答1:

Consider a groupby.transform to calculate percentage column, then run barplot with x for original value column and y for percent column.

Data (only converted two No to Yes responses to original posted data)

from io import StringIO
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt

txt = '''
    State           Response     Coverage   Education   "Effective To Date"   EmploymentStatus       Gender   "Location Code"   "Marital Status"  "Policy Type" Policy    "Renew Offer Type"  "Sales Channel"   "Vehicle Class"   "Vehicle Size" 
0   Washington  No  Basic   Bachelor    "2/24/11" Employed    F   Suburban    Married "Corporate Auto"  "Corporate L3"    Offer1  Agent   "Two-Door Car"    Medsize  
1   Arizona     No  Extended    Bachelor  "1/31/11"   Unemployed  F   Suburban    Single  "Personal Auto"   "Personal L3" Offer3  Agent   "Four-Door Car"   Medsize
2   Nevada      Yes  Premium Bachelor    "2/19/11" Employed    F   Suburban    Married "Personal Auto"   "Personal L3" Offer1  Agent   "Two-Door Car"    Medsize
3   California  No  Basic   Bachelor    "1/20/11" Unemployed  M   Suburban    Married "Corporate Auto"  "Corporate L2"    Offer1  "Call Center" SUV Medsize
4   Washington  Yes  Basic   Bachelor    "2/3/11"  Employed    M   Rural   Single  "Personal Auto"   "Personal L1" Offer1  Agent   "Four-Door Car"   Medsize'''

df_categorical = pd.read_table(StringIO(txt), sep="\s+")

Plot (single figure of multiple plots across two columns)

fig = plt.figure(figsize=(10,30))

for i, col in enumerate(df_categorical.columns):   
   # PERCENT COLUMN CALCULATION
   df_categorical[col+'_pct'] = df_categorical.groupby(['Response', col])[col]\
                                   .transform(lambda x: len(x)) / len(df_categorical)

   plt.subplot(8, 2, i+1)   
   sns.barplot(x=col, y=col+'_pct', hue='Response', data=df_categorical)\
          .set(xlabel=col, ylabel='Percent')    

plt.tight_layout()
plt.show()
plt.clf()

plt.close('all')