Seaborn groupby pandas Series

2019-06-24 07:04发布

I want to visualize my data into box plots that are grouped by another variable shown here in my terrible drawing:

enter image description here

So what I do is to use a pandas series variable to tell pandas that I have grouped variables so this is what I do:

import pandas as pd
import seaborn as sns
#example data for reproduciblity
a = pd.DataFrame(
[
[2, 1],
[4, 2],
[5, 1],
[10, 2],
[9, 2],
[3, 1]
])

#converting second column to Series 
a.ix[:,1] = pd.Series(a.ix[:,1])
#Plotting by seaborn
sns.boxplot(a, groupby=a.ix[:,1])

And this is what I get:

seaborn plot

However, what I would have expected to get was to have two boxplots each describing only the first column, grouped by their corresponding column in the second column (the column converted to Series), while the above plot shows each column separately which is not what I want.

1条回答
时光不老,我们不散
2楼-- · 2019-06-24 07:57

A column in a Dataframe is already a Series, so your conversion is not necessary. Furthermore, if you only want to use the first column for both boxplots, you should only pass that to Seaborn.

So:

#example data for reproduciblity
df = pd.DataFrame(
[
[2, 1],
[4, 2],
[5, 1],
[10, 2],
[9, 2],
[3, 1]
], columns=['a', 'b'])

#Plotting by seaborn
sns.boxplot(df.a, groupby=df.b)

I changed your example a little bit, giving columns a label makes it a bit more clear in my opinion.

enter image description here

edit:

If you want to plot all columns separately you (i think) basically want all combinations of the values in your groupby column and any other column. So if you Dataframe looks like this:

    a   b  grouper
0   2   5        1
1   4   9        2
2   5   3        1
3  10   6        2
4   9   7        2
5   3  11        1

And you want boxplots for columns a and b while grouped by the column grouper. You should flatten the columns and change the groupby column to contain values like a1, a2, b1 etc.

Here is a crude way which i think should work, given the Dataframe shown above:

dfpiv = df.pivot(index=df.index, columns='grouper')

cols_flat = [dfpiv.columns.levels[0][i] + str(dfpiv.columns.levels[1][j]) for i, j in zip(dfpiv.columns.labels[0], dfpiv.columns.labels[1])]  
dfpiv.columns = cols_flat
dfpiv = dfpiv.stack(0)

sns.boxplot(dfpiv, groupby=dfpiv.index.get_level_values(1))

enter image description here

Perhaps there are more fancy ways of restructuring the Dataframe. Especially the flattening of the hierarchy after pivoting is hard to read, i dont like it.

查看更多
登录 后发表回答