Pivot sort by time values - Pandas

2020-04-21 02:44发布

问题:

I want to pivot a df and display values based off time values, not column values.

df = pd.DataFrame({
    'Place' : ['John','Alan','Cory','Jim','John','Alan','Cory','Jim'],                                
    'Number' : ['2','3','5','5','3','4','6','6'],          
    'Code' : ['1','2','3','4','1','2','3','4'],                      
    'Time' : ['1904-01-01 08:00:00','1904-01-01 09:00:00','1904-01-02 01:00:00','1904-01-02 02:00:00','1904-01-01 08:10:00','1904-01-01 09:10:00','1904-01-02 01:10:00','1904-01-02 02:10:00'],                           
    })

df = df.pivot_table(index = 'Number', columns = 'Place', values = 'Time', aggfunc = 'first').fillna('')

Out:

Place                  Alan                 Cory                  Jim                 John
Number                                                                                    
2                                                                      1904-01-01 08:00:00
3       1904-01-01 09:00:00                                            1904-01-01 08:10:00
4       1904-01-01 09:10:00                                                               
5                            1904-01-02 01:00:00  1904-01-02 02:00:00                     
6                            1904-01-02 01:10:00  1904-01-02 02:10:00 

Intended Output:

Place                  John                 Alan                 Cory                  Jim
Number                                                                                    
2       1904-01-01 08:00:00                                                               
3       1904-01-01 08:10:00  1904-01-01 09:00:00                                          
4                            1904-01-01 09:10:00                                          
5                                                 1904-01-02 01:00:00  1904-01-02 02:00:00
6                                                 1904-01-02 01:10:00  1904-01-02 02:10:00             

Note: I've only added a dummy dates to differentiate for times after midnight. I will eventually drop the dates and just leave the times once the df is appropriately sorted.

回答1:

Unfortunately pivot_table sorting columns names by default and no paramater for avoid it. So possible solution is DataFrame.reindex by original unique values of column Place:

#if necessary convert to datetimes and sorting
df['Time'] = pd.to_datetime(df['Time'])
df = df.sort_values('Time')
df1 = df.pivot_table(index='Number',columns='Place',values='Time',aggfunc='first').fillna('')

df1 = df1.reindex(columns=df['Place'].unique())
print (df1)
Place                  John                 Alan                 Cory  \
Number                                                                  
2       1904-01-01 08:00:00                                             
3       1904-01-01 08:10:00  1904-01-01 09:00:00                        
4                            1904-01-01 09:10:00                        
5                                                 1904-01-02 01:00:00   
6                                                 1904-01-02 01:10:00   

Place                   Jim  
Number                       
2                            
3                            
4                            
5       1904-01-02 02:00:00  
6       1904-01-02 02:10:00  


标签: pandas pivot