I have the following plot in seaborn:
df = pandas.DataFrame({"sample": ["X", "X", "X", "Y", "Y", "Y"],
"value": [0.2, 0.3, 0.4, 0.7, 0.75, 0.8],
"rep": ["a", "b", "c", "a", "b", "c"]})
plt.figure()
ax = sns.stripplot(x="sample", y="value", edgecolor="none",
hue="sample", palette="Set1", data=df)
# how to plot median line?
plt.show()
It plots the points in gray scale colors instead of using Set1
and only shows X
in the legend and not Y
:
I also want to add a horizontal line at the median for X
and Y
. how can this be done? factorplot
doesn't appear to have a horizontal line option.
You may plot lines by using matplolib. Pandas may calculate medians value for your dataset. I use seaborn 0.7.0 in this example:
from pandas import DataFrame
import matplotlib.pyplot as plt
import seaborn as sns
df = DataFrame({"sample": ["X", "X", "X", "Y", "Y", "Y"],
"value": [0.2, 0.3, 0.4, 0.7, 0.75, 0.8],
"rep": ["a", "b", "c", "a", "b", "c"]})
# calc medians
xmed = df.loc[df["sample"] == 'X'].median()['value']
ymed = df.loc[df["sample"] == 'Y'].median()['value']
sns.stripplot(x="sample", y="value", edgecolor="none",
hue="sample", palette="Set1", data=df)
x = plt.gca().axes.get_xlim()
# how to plot median line?
plt.plot(x, len(x) * [xmed], sns.xkcd_rgb["pale red"])
plt.plot(x, len(x) * [ymed], sns.xkcd_rgb["denim blue"])
plt.show()
We can limit the width of each median line to its respective column by looping through the Axes ticks and ticklabels after generating the stripplot. This also enables the code to operate independent of the number of samples (columns) to be plotted.
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
df = pd.DataFrame({"sample": ["X", "X", "X", "Y", "Y", "Y"],
"value": [0.2, 0.3, 0.4, 0.7, 0.75, 0.8],
"rep": ["a", "b", "c", "a", "b", "c"]})
ax = sns.stripplot(x="sample", y="value", data=df, palette="Set1", s=8)
# distance across the "X" or "Y" stipplot column to span, in this case 40%
median_width = 0.4
for tick, text in zip(ax.get_xticks(), ax.get_xticklabels()):
sample_name = text.get_text() # "X" or "Y"
# calculate the median value for all replicates of either X or Y
median_val = df[df['sample']==sample_name].value.median()
# plot horizontal lines across the column, centered on the tick
ax.plot([tick-median_width/2, tick+median_width/2], [median_val, median_val],
lw=4, color='k')
plt.show()
seaborn stripplot with median lines drawn: