Pandas MultiIndex custom sort levels by categorica

2020-07-24 06:21发布

问题:

I'm new to Pandas (0.16.1), and want custom sort in multiindex so i use Categoricals. Part of my multiindex:

Part  Defect Own
Кузов 504    ИП
Кузов 504    Итого
Кузов 504    ПС
Кузов 505    ПС
Кузов 506    ПС
Кузов 507    ПС
Кузов 530    ИП
Кузов 530    Итого
Кузов 530    ПС

I create pivot table with MultiIndex levels [Defect, Own]. Then i make "Own" Categorical (see p.s. part of question) to sort it as [ИП, ПС, Итого]. But when i prepend levels with "Part", which is also Categorical based on "Defect" level, and sort index with

pvt.sortlevel(0, inplace=True)

"Own" level is sorted in alphabetical order: [ИП, Итого, ПС]. How can i custom-sort two levels in multiindex?

P. S. I convert "Own" level to Categorical with the following code: create new column, replace index level with it. Is it ok?

def makeLevelCategorical(pdf, pname, cats):
    names = pdf.index.names
    namei = names.index(pname)
    pdf["tmp"] = pd.Categorical(pdf.index.get_level_values(pname), categories=cats) #New temp column
    pdf.set_index("tmp", append=True, inplace=True) #Append column to index
    pdf = pdf.reset_index(pname, drop=True) #Remove /pname/ level
    names2 = list(names)
    names2[namei] = "tmp"
    pdf.reorder_levels(names2)  #Put "tmp" level to /pname/'s position
    pdf.index.names = names     #Rename "tmp" level to /pname/
    return pdf

回答1:

Sorting a multiindex can be done using the Dataframe.sort_index function.

Here is a small example:

df = pd.DataFrame(
    {"i1":[1,1,1,1,2,4,4,2,3,3,3,3],
     "i2":[1,3,2,2,1,1,2,2,1,1,3,2],
     "d1":['a','b','c','d','e','f','g','h','i','j','k','l']}
)
df.set_index(['i1', 'i2'], inplace=True)
df.sort_index()

Outputs:

        d1
i1  i2  
1   1   a
    2   c
    2   d
    3   b
2   1   e
    2   h
3   1   i
    1   j
    2   l
    3   k
4   1   f
    2   g

If you want to change the sort order on column basis, the Dataframe.sort_index function takes an argument ascending= which can be given a list of [True, False] statements corresponding to the columns in order.

Categorical is a new shiny dtype in pandas and it should be used, but it is not needed for this operation per se.

Edit due to comment:

Sort will always sort alphabetically or in reverse order. If you want custom sort, then you need to create a new column which can be sorted alphabetically but is a result of the column which can determine the sorting. Do this using Series.map, like this example, that sorts the datasets with vowels first:

mappings = {'a': 0, 'b':1, 'c':1, 'd':1,
            'e':0, 'f':1, 'g':1, 'h':1,
            'i':0, 'j':1, 'k': 1, 'l': 1}
df['sortby'] = df['d1'].map(mappings)
df.sort('sortby')

        d1  sortby
i1  i2      
1   1   a   0
2   1   e   0
3   1   i   0
1   3   b   1
    2   c   1
    2   d   1
4   1   f   1
    2   g   1
2   2   h   1
3   1   j   1
    3   k   1
    2   l   1

If you do not want the sortby column after that, you can simply delete it, like this:

del df['sortby']