I'm trying to analyze the wine-quality
dataset. There are two datasets: the red wine
dataset and the white wine
. I combine them together to form the wine_df
. I want to plot it. And I want to give the red histogram red color, the white histogram white color. But for some histogram, its label and its color are inconsistent. For example, the fourth one's label is (4,white), while its color is red. What should I do? Thanks for your answer!
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
red_wine = pd.read_csv('https://raw.githubusercontent.com/nishanthgandhidoss/Wine-Quality/master/data/winequality-red.csv',
sep = ';')
white_wine = pd.read_csv('https://raw.githubusercontent.com/nishanthgandhidoss/Wine-Quality/master/data/winequality-white.csv',
sep = ';')
## Add a column to each data to identify the wine color
red_wine['color'] = 'red'
white_wine['color'] = 'white'
## Combine the two dataframes
wine_df = pd.concat([red_wine, white_wine])
colors = ['red','white']
plt.style.use('ggplot')
counts = wine_df.groupby(['quality', 'color']).count()['pH']
counts.plot(kind='bar', title='Counts by Wine Color and quality', color=colors, alpha=.7)
plt.xlabel('Quality and Color', fontsize=18)
plt.ylabel('Count', fontsize=18)
plt.show()