Pandas - Create columns from column value, and fil

2019-07-19 09:20发布

问题:

I have a dataframe similar to below.

Index Time Weekday 0 21:10:00 Tuesday 1 21:15:00 Tuesday 2 21:20:00 Tuesday 3 21:20:00 Tuesday 4 21:25:00 Wednesday 5 21:25:00 Wednesday 6 21:30:00 Friday 7 21:35:00 Thursday 8 21:35:00 Wednesday 9 21:40:00 Wednesday 10 21:40:00 Wednesday 11 21:40:00 Monday

I want to put the weekdays into columns, and count how many times each time appears for each day, my goal is this:

Time Monday Tuesday Wednesday Thursday Friday 21:10:00 0 1 0 0 0 21:15:00 0 1 0 0 0 21:20:00 0 2 0 0 0 21:25:00 0 0 2 0 0 21:30:00 0 0 0 0 1 21:35:00 0 0 1 1 0 21:40:00 1 0 2 0 0

The reason for this is because I want to create a heatmap in seaborn, and I read my data has to be pivoted/shaped a certain way: https://stackoverflow.com/a/37790707/9384889

I know how to count how frequent each Time value appears, ignoring the weekday: df['Time'].value_counts() And I have been reading http://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.pivot.html But I cannot see how to combine these two ideas.

回答1:

Use groupby with size and unstack or crosstab alternative for reshape.

For change order of days need ordered Categorical or reindex by columns:

cats = ['Monday','Tuesday','Wednesday','Thursday','Friday']

df['Weekday'] = pd.Categorical(df['Weekday'], categories=cats, ordered=True)

df = df.groupby(['Time', 'Weekday']).size().unstack(fill_value=0)

df = df.groupby(['Time', 'Weekday']).size().unstack(fill_value=0).reindex(columns=cats)

Alternatives:

df = pd.crosstab(df['Time'], pd.Categorical(df['Weekday'], categories=cats, ordered=True))

df = pd.crosstab(df['Time'], df['Weekday']).reindex(columns=cats)

print (df)

col_0     Monday  Tuesday  Wednesday  Thursday  Friday
Time                                                  
21:10:00       0        1          0         0       0
21:15:00       0        1          0         0       0
21:20:00       0        2          0         0       0
21:25:00       0        0          2         0       0
21:30:00       0        0          0         0       1
21:35:00       0        0          1         1       0
21:40:00       1        0          2         0       0

Last use seaborn.heatmap:

import seaborn as sns

sns.heatmap(df, annot=True, fmt="g", cmap='viridis')