How to sort pandas pivot_table based on newest dat

2019-07-13 19:09发布

I've created a DataFrame in my desired date order, however, when I put this into a pivot table the order changes.

I wanted to sort the pivot table base on the newest date of any of the rows within a given level

data = [['yellow',1,'02/01/2015'],
        ['yellow',2,'04/01/2015'],
        ['green',3,'03/01/2015'],
        ['red',4,'01/01/2015']]

    df = pd.DataFrame(data, columns=['colour','number','date'])
    df.pivot_table(index=['number','date'])

The result is

                    number
colour  date    
green   03/01/2015  3
red     01/01/2015  4
yellow  02/01/2015  1
        04/01/2015  2

I want the end result to be a list of colours which have newest dates to be at the top, basically a sort on the newest of the dates per row (the ones with the asterix around them). So the result would be:-

                    number
colour  date    
yellow  02/01/2015  2
        *04/01/2015*  3
green   *03/01/2015*    4
red     *01/01/2015*    1

I can think of three solutions but I can't work them out

a) get pivot_table to keep the original order b) do a sort on the pivot_table using a func along the lines of latest_date_in_rows c) create an extra column containing the latest date against each colour

not sure which is the right route to take in the world of pandas, but at the moment I'm stuck :(

1条回答
戒情不戒烟
2楼-- · 2019-07-13 19:21

You can remember old multiindex before pivoting and then reindex output dataframe by old multiindex.

import pandas as pd

data = [['yellow',1,'02/01/2015'],
        ['yellow',2,'04/01/2015'],
        ['green',3,'03/01/2015'],
        ['red',4,'01/01/2015']]
df = pd.DataFrame(data, columns=['colour','number','date'])
#simulate datetime column date
df['date'] = pd.to_datetime(df['date'])
#set index from columns colour and date
df = df.set_index(['colour', 'date'])
print df
#                   number
#colour date              
#yellow 2015-02-01       1
#       2015-04-01       2
#green  2015-03-01       3
#red    2015-01-01       4

#set old index to variable idx
idx = df.index
print df.index

#pivot table, it doesn't work with test data
df.pivot_table(index=['number','date'])

#reindex by old multiindex
df1 = df.reindex(idx)
print df1
#                   number
#colour date              
#yellow 2015-02-01       1
#       2015-04-01       2
#green  2015-03-01       3
#red    2015-01-01       4

EDIT:

I think problem is that original dataframe isn't sorted. Its multiindex is:

MultiIndex(levels=[[u'green', u'red', u'yellow'], [u'2015-01-01', u'2015-02-01', u'2015-03-01', u'2015-04-01']],
           labels=[[2, 2, 0, 1], [1, 3, 2, 0]],
           names=[u'colour', u'date'])

Output dataframe has multiindex sorted by colour:

MultiIndex(levels=[[u'green', u'red', u'yellow'], [u'2015-01-01', u'2015-02-01', u'2015-03-01', u'2015-04-01']],
           labels=[[0, 1, 2, 2], [2, 0, 1, 3]],
           names=[u'colour', u'date'])

And you can sorted by level date, but multiindex and output is:

idx1 = df.sortlevel(level='date').index
print idx1
MultiIndex(levels=[[u'green', u'red', u'yellow'], [u'2015-01-01', u'2015-02-01', u'2015-03-01', u'2015-04-01']],
           labels=[[1, 2, 0, 2], [0, 1, 2, 3]],
           names=[u'colour', u'date'])


#reindex by idx1
df1 = df.reindex(idx)
                   number
colour date              
red    2015-01-01       4
yellow 2015-02-01       1
green  2015-03-01       3
yellow 2015-04-01       2

So solution is reindex by original multiindex.

查看更多
登录 后发表回答