Drawing points with with median lines in seaborn u

2019-06-03 16:23发布

问题:

I have the following plot in seaborn:

df = pandas.DataFrame({"sample": ["X", "X", "X", "Y", "Y", "Y"],
                       "value": [0.2, 0.3, 0.4, 0.7, 0.75, 0.8],
                       "rep": ["a", "b", "c", "a", "b", "c"]})
plt.figure()
ax = sns.stripplot(x="sample", y="value", edgecolor="none",
                   hue="sample", palette="Set1", data=df)

# how to plot median line?
plt.show()

It plots the points in gray scale colors instead of using Set1 and only shows X in the legend and not Y:

I also want to add a horizontal line at the median for X and Y. how can this be done? factorplot doesn't appear to have a horizontal line option.

回答1:

You may plot lines by using matplolib. Pandas may calculate medians value for your dataset. I use seaborn 0.7.0 in this example:

from pandas import DataFrame
import matplotlib.pyplot as plt
import seaborn as sns

df = DataFrame({"sample": ["X", "X", "X", "Y", "Y", "Y"],
                       "value": [0.2, 0.3, 0.4, 0.7, 0.75, 0.8],
                       "rep": ["a", "b", "c", "a", "b", "c"]})
# calc medians
xmed = df.loc[df["sample"] == 'X'].median()['value']
ymed = df.loc[df["sample"] == 'Y'].median()['value']

sns.stripplot(x="sample", y="value", edgecolor="none",
 hue="sample", palette="Set1", data=df)

x = plt.gca().axes.get_xlim()

# how to plot median line?
plt.plot(x, len(x) * [xmed], sns.xkcd_rgb["pale red"])
plt.plot(x, len(x) * [ymed], sns.xkcd_rgb["denim blue"])
plt.show()



回答2:

We can limit the width of each median line to its respective column by looping through the Axes ticks and ticklabels after generating the stripplot. This also enables the code to operate independent of the number of samples (columns) to be plotted.


    import pandas as pd
    import seaborn as sns
    import matplotlib.pyplot as plt

    df = pd.DataFrame({"sample": ["X", "X", "X", "Y", "Y", "Y"],
                       "value": [0.2, 0.3, 0.4, 0.7, 0.75, 0.8],
                       "rep": ["a", "b", "c", "a", "b", "c"]})

    ax = sns.stripplot(x="sample", y="value", data=df, palette="Set1", s=8)

    # distance across the "X" or "Y" stipplot column to span, in this case 40%
    median_width = 0.4

    for tick, text in zip(ax.get_xticks(), ax.get_xticklabels()):
        sample_name = text.get_text()  # "X" or "Y"

        # calculate the median value for all replicates of either X or Y
        median_val = df[df['sample']==sample_name].value.median()

        # plot horizontal lines across the column, centered on the tick
        ax.plot([tick-median_width/2, tick+median_width/2], [median_val, median_val],
                lw=4, color='k')

    plt.show()

seaborn stripplot with median lines drawn: