plot different color for different categorical lev

2019-01-10 07:09发布

I have this data frame diamonds which is composed of variables like (carat, price, color), and I want to draw a scatter plot of price to carat for each color, which means different color has different color in the plot.

This is easy in R with ggplot:

ggplot(aes(x=carat, y=price, color=color),  #by setting color=color, ggplot automatically draw in different colors
       data=diamonds) + geom_point(stat='summary', fun.y=median)

enter image description here

I wonder how could this be done in Python using matplotlib ?

PS:

I know about auxiliary plotting packages, such as seaborn and ggplot for python, and I donot prefer them, just want to find out if it is possible to do the job using matplotlib alone, ;P

4条回答
叛逆
2楼-- · 2019-01-10 07:28

Here's a succinct and generic solution to use a seaborn color palette.

First find a color palette you like and optionally visualize it:

sns.palplot(sns.color_palette("Set2", 8))

Then you can use it with matplotlib doing this:

# Unique category labels: 'D', 'F', 'G', ...
color_labels = df['color'].unique()

# List of RGB triplets
rgb_values = sns.color_palette("Set2", 8)

# Map label to RGB
color_map = dict(zip(color_labels, rgb_values))

# Finally use the mapped values
plt.scatter(df['carat'], df['price'], c=df['color'].map(color_map))
查看更多
疯言疯语
3楼-- · 2019-01-10 07:38

You can pass plt.scatter a c argument which will allow you to select the colors. The code below defines a colors dictionary to map your diamond colors to the plotting colors.

import matplotlib.pyplot as plt
import pandas as pd

carat = [5, 10, 20, 30, 5, 10, 20, 30, 5, 10, 20, 30]
price = [100, 100, 200, 200, 300, 300, 400, 400, 500, 500, 600, 600]
color =['D', 'D', 'D', 'E', 'E', 'E', 'F', 'F', 'F', 'G', 'G', 'G',]

df = pd.DataFrame(dict(carat=carat, price=price, color=color))

fig, ax = plt.subplots()

colors = {'D':'red', 'E':'blue', 'F':'green', 'G':'black'}

ax.scatter(df['carat'], df['price'], c=df['color'].apply(lambda x: colors[x]))

plt.show()

df['color'].apply(lambda x: colors[x]) effectively maps the colours from "diamond" to "plotting".

(Forgive me for not putting another example image up, I think 2 is enough :P)

With seaborn

You can use seaborn which is a wrapper around matplotlib that makes it look prettier by default (rather opinion-based, I know :P) but also adds some plotting functions.

For this you could use seaborn.lmplot with fit_reg=False (which prevents it from automatically doing some regression).

The code below uses an example dataset. By selecting hue='color' you tell seaborn to split your dataframe up based on your colours and then plot each one.

import matplotlib.pyplot as plt
import seaborn as sns

import pandas as pd

carat = [5, 10, 20, 30, 5, 10, 20, 30, 5, 10, 20, 30]
price = [100, 100, 200, 200, 300, 300, 400, 400, 500, 500, 600, 600]
color =['D', 'D', 'D', 'E', 'E', 'E', 'F', 'F', 'F', 'G', 'G', 'G',]

df = pd.DataFrame(dict(carat=carat, price=price, color=color))

sns.lmplot('carat', 'price', data=df, hue='color', fit_reg=False)

plt.show()

enter image description here

Without seaborn using pandas.groupby

If you don't want to use seaborn then you can use pandas.groupby to get the colors alone and then plot them using just matplotlib, but you'll have to manually assign colors as you go, I've added an example below:

fig, ax = plt.subplots()

colors = {'D':'red', 'E':'blue', 'F':'green', 'G':'black'}

grouped = df.groupby('color')
for key, group in grouped:
    group.plot(ax=ax, kind='scatter', x='carat', y='price', label=key, color=colors[key])

plt.show()

This code assumes the same DataFrame as above and then groups it based on color. It then iterates over these groups, plotting for each one. To select a color I've created a colors dictionary which can map the diamond color (for instance D) to a real color (for instance red).

enter image description here

查看更多
淡お忘
4楼-- · 2019-01-10 07:38

Here a combination of markers and colors from a qualitative colormap in matplotlib:

import itertools
import numpy as np
from matplotlib import markers
import matplotlib.pyplot as plt

m_styles = markers.MarkerStyle.markers
N = 60
colormap = plt.cm.Dark2.colors   # Qualitative colormap
for i,(marker,color) in zip(range(N),itertools.product(m_styles, colormap)):
    plt.scatter(*np.random.random(2), color=color,marker=marker,label=i)
plt.legend(bbox_to_anchor=(1.05, 1), loc=2, borderaxespad=0.,ncol=4);

enter image description here

查看更多
欢心
5楼-- · 2019-01-10 07:39

Using Altair.

from altair import *
import pandas as pd

df = datasets.load_dataset('iris')
Chart(df).mark_point().encode(x='petalLength',y='sepalLength', color='species')

enter image description here

查看更多
登录 后发表回答