Pareto is very popular diagarm in Excel and Tableu. In excel we can easily draw a Pareto diagram but I found no easy way to draw the diagram in Python.
I have a pandas dataframe like this:
import numpy as np
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
df = pd.DataFrame({'country': [177.0, 7.0, 4.0, 2.0, 2.0, 1.0, 1.0, 1.0]})
df.index = ['USA', 'Canada', 'Russia', 'UK', 'Belgium', 'Mexico', 'Germany', 'Denmark']
print(df)
country
USA 177.0
Canada 7.0
Russia 4.0
UK 2.0
Belgium 2.0
Mexico 1.0
Germany 1.0
Denmark 1.0
How to draw the Pareto diagram ? Using maybe pandas, seaborn, matplotlib etc?
So far I have been able to make descending order bar chart. But its still remaining to put cumulative sum line plot on top of them.
My attempt:
df.sort_values(by='country',ascending=False).plot.bar()
pareto chart for pandas.dataframe
More generalized version of ImportanceOfBeingErnest's code:
And this one includes Pareto by grouping according to a threshold, too. For example: If you set it to 70, it will group minorities beyond 70 into one group called "Other".
Another way is using the
secondary_y
parameter without usingtwinx()
:The parameter
use_index=True
is needed because yourindex
is yourx
axis in this case. Otherwise you could've usedx='x_Variable'
.You would probably want to create a new column with the percentage in it and plot one column as bar chart and the other as a line chart in a twin axes.