Is there a way to make matplotlib scatter plot mar

2019-03-30 04:12发布

问题:

I'm making scatterplots out of a DF using matplotlib. In order to get different colors for each data set, I'm making two separate calls to plt.scatter:

plt.scatter(zzz['HFmV'], zzz['LFmV'], label = dut_groups[0], color = 'r' )
plt.scatter(qqq['HFmV'], qqq['LFmV'], label = dut_groups[1], color = 'b' )
plt.legend()
plt.show()

This gives me the desired color dependence but really what would be ideal is if I could just get pandas to give me the scatterplot with several datasets on the same plot by something like

df.plot(kind = scatter(x,y, color = df.Group, marker = df.Head)

Apparently there is no such animal (at least that I could find). So, next best thing in my mind would be to put the plt.scatter calls into a loop where I could make the color or marker vary according to one of the rows (not x or y, but some other row. If the row I want to use were a continuous variable it looks like I could use a colormap, but in my case the row I need to sue for this is a string ( categorical type of variable, not a number).

Any help much appreciated.

回答1:

What you're doing will almost work, but you have to pass color a vector of colors, not just a vector of variables. So you could do:

color = df.Group.map({dut_groups[0]: "r", dut_groups[1]: "b"})
plt.scatter(x, y, color=color)

Same goes for the marker style

You could also use seaborn to do the color-mapping the way you expect (as discussed here), although it doesn't do marker style mapping:

import seaborn as sns
import pandas as pd
from numpy.random import randn

data = pd.DataFrame(dict(x=randn(40), y=randn(40), g=["a", "b"] * 20))
sns.lmplot("x", "y", hue="g", data=data, fit_reg=False)