MatPlotlib Seaborn Multiple Plots formatting

2019-07-25 03:05发布

问题:

I am translating a set of R visualizations to Python. I have the following target R multiple plot histograms:

Using Matplotlib and Seaborn combination and with the help of a kind StackOverflow member (see the link: Python Seaborn Distplot Y value corresponding to a given X value), I was able to create the following Python plot:

I am satisfied with its appearance, except, I don't know how to put the Header information in the plots. Here is my Python code that creates the Python Charts

""" Program to draw the sampling histogram distributions """
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from matplotlib.backends.backend_pdf import PdfPages
import seaborn as sns

def main():
    """ Main routine for the sampling histogram program """
    sns.set_style('whitegrid')
    markers_list = ["s", "o", "*", "^", "+"]
    # create the data dataframe as df_orig
    df_orig = pd.read_csv('lab_samples.csv')
    df_orig = df_orig.loc[df_orig.hra != -9999]
    hra_list_unique = df_orig.hra.unique().tolist()
    # create and subset df_hra_colors to match the actual hra colors in df_orig
    df_hra_colors = pd.read_csv('hra_lookup.csv')
    df_hra_colors['hex'] = np.vectorize(rgb_to_hex)(df_hra_colors['red'], df_hra_colors['green'], df_hra_colors['blue'])
    df_hra_colors.drop(labels=['red', 'green', 'blue'], axis=1, inplace=True)
    df_hra_colors = df_hra_colors.loc[df_hra_colors['hra'].isin(hra_list_unique)]

    # hard coding the current_component to pc1 here, we will extend it by looping
    # through the list of components
    current_component = 'pc1'
    num_tests = 5
    df_columns = df_orig.columns.tolist()
    start_index = 5
    for test in range(num_tests):
        current_tests_list = df_columns[start_index:(start_index + num_tests)]
        # now create the sns distplots for each HRA color and overlay the tests
        i = 1
        for _, row in df_hra_colors.iterrows():
            plt.subplot(3, 3, i)
            select_columns = ['hra', current_component] + current_tests_list
            df_current_color = df_orig.loc[df_orig['hra'] == row['hra'], select_columns]
            y_data = df_current_color.loc[df_current_color[current_component] != -9999, current_component]
            axs = sns.distplot(y_data, color=row['hex'],
                               hist_kws={"ec":"k"},
                               kde_kws={"color": "k", "lw": 0.5})
            data_x, data_y = axs.lines[0].get_data()
            axs.text(0.0, 1.0, row['hra'], horizontalalignment="left", fontsize='x-small',
                     verticalalignment="top", transform=axs.transAxes)
            for current_test_index, current_test in enumerate(current_tests_list):
                # this_x defines the series of current_component(pc1,pc2,rhob) for this test
                # indicated by 1, corresponding R program calls this test_vector
                x_series = df_current_color.loc[df_current_color[current_test] == 1, current_component].tolist()
                for this_x in x_series:
                    this_y = np.interp(this_x, data_x, data_y)
                    axs.plot([this_x], [this_y - current_test_index * 0.05],
                             markers_list[current_test_index], markersize = 3, color='black')
            axs.xaxis.label.set_visible(False)
            axs.xaxis.set_tick_params(labelsize=4)
            axs.yaxis.set_tick_params(labelsize=4)
            i = i + 1
        start_index = start_index + num_tests
    # plt.show()
    pp = PdfPages('plots.pdf')
    pp.savefig()
    pp.close()

def rgb_to_hex(red, green, blue):
    """Return color as #rrggbb for the given color values."""
    return '#%02x%02x%02x' % (red, green, blue)

if __name__ == "__main__":
    main()

The Pandas code works fine and it is doing what it is supposed to. It is my lack of knowledge and experience of using 'PdfPages' in Matplotlib that is the bottleneck. How can I show the header information in Python/Matplotlib/Seaborn that I can show in the corresponding R visalization. By the Header information, I mean What The R visualization has at the top before the histograms, i.e., 'pc1', MRP, XRD,....

I can get their values easily from my program, e.g., current_component is 'pc1', etc. But I don't know how to format the plots with the Header. Can someone provide some guidance?

回答1:

You may be looking for a figure title or super title, fig.suptitle:

fig.suptitle('this is the figure title', fontsize=12)

In your case you can easily get the figure with plt.gcf(), so try

plt.gcf().suptitle("pc1")

The rest of the information in the header would be called a legend. For the following let's suppose all subplots have the same markers. It would then suffice to create a legend for one of the subplots. To create legend labels, you can put the labelargument to the plot, i.e.

axs.plot( ... , label="MRP")

When later calling axs.legend() a legend will automatically be generated with the respective labels. Ways to position the legend are detailed e.g. in this answer.
Here, you may want to place the legend in terms of figure coordinates, i.e.

ax.legend(loc="lower center",bbox_to_anchor=(0.5,0.8),bbox_transform=plt.gcf().transFigure)