python - seaborn: share X label not working as exp

2019-02-16 03:16发布

问题:

i am dealing with a dataset that shows relationships between two points, such as bus stops. For example, we have bus stops A, B, C, and D.

I want to make histogram plot that shows, for each bus stop, how long it takes to get to the other 3 bus stops.

Obviously, there is no time from A to A, therefore, that should be blank.

When I plot it, I see that the first row shows B C D, the second row shows A, C, D, etc. The columns are misaligned and the colors don't represent the same column in each row.

If I add sharex = True, it simply just removes the x labels on each axis. That's obviously not what I want to see here.

I would instead like to see 4 columns in the order of A, B, C, D. When it's A to A, it should just be blank, and the colors should be consistent.

Does anyone know how to accomplish this?

import pandas as pd
import numpy as np
import seaborn as sns
%matplotlib inline

time=np.random.randn(1000)
point1 = ['A','B','C','D'] * 250
point2 = ['A'] * 250 + ['B'] * 250 + ['C'] * 250 + ['D'] * 250 

df_time = pd.DataFrame(
    {'point1': point1,
     'point2': point2,
     'time': time
    })
df_time=df_time[df_time['point1']!=df_time['point2']] ##cannot sell to another

fig, ax = plt.subplots(nrows=4, sharey=True)
fig.set_size_inches(12, 16)
for point1i, axi in zip(point1, ax.ravel()):
    sns.boxplot(data=df_time[df_time['point1']==point1i], x='point2', y='time', ax=axi)

回答1:

As seen from the documentation, sns.boxplot has an argumen order

order, hue_order : lists of strings, optional
Order to plot the categorical levels in, otherwise the levels are inferred from the data objects.

Using this like

sns.boxplot(..., order=['A','B','C','D'])

would give you the desired plot.

Complete code:

import pandas as pd
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt

time=np.random.randn(1000)
point1 = ['A','B','C','D'] * 250
point2 = ['A'] * 250 + ['B'] * 250 + ['C'] * 250 + ['D'] * 250 

df_time = pd.DataFrame(
    {'point1': point1,
     'point2': point2,
     'time': time
    })
df_time=df_time[df_time['point1']!=df_time['point2']] ##cannot sell to another

fig, ax = plt.subplots(nrows=4, sharey=True)

for point1i, axi in zip(point1, ax.ravel()):
    sns.boxplot(data=df_time[df_time['point1']==point1i], x='point2', y='time', 
                ax=axi, order=['A','B','C','D'])

plt.tight_layout()    
plt.show()