I have python pandas dataframe, in which a column contains month name.
How can I do a custom sort using a dictionary, for example:
custom_dict = {'March':0, 'April':1, 'Dec':3}
I have python pandas dataframe, in which a column contains month name.
How can I do a custom sort using a dictionary, for example:
custom_dict = {'March':0, 'April':1, 'Dec':3}
returns a DataFrame with columns March, April, Dec
A bit late to the game, but here's a way to create a function that sorts pandas Series, DataFrame, and multiindex DataFrame objects using arbitrary functions.
I make use of the
df.iloc[index]
method, which references a row in a Series/DataFrame by position (compared todf.loc
, which references by value). Using this, we just have to have a function that returns a series of positional arguments:You can use this to create custom sorting functions. This works on the dataframe used in Andy Hayden's answer:
This also works on multiindex DataFrames and Series objects:
To me this feels clean, but it uses python operations heavily rather than relying on optimized pandas operations. I haven't done any stress testing but I'd imagine this could get slow on very large DataFrames. Not sure how the performance compares to adding, sorting, then deleting a column. Any tips on speeding up the code would be appreciated!
Pandas 0.15 introduced Categorical Series, which allows a much clearer way to do this:
First make the month column a categorical and specify the ordering to use.
Now, when you sort the month column it will sort with respect to that list:
Note: if a value is not in the list it will be converted to NaN.
An older answer for those interested...
You could create an intermediary series, and
set_index
on that:As commented, in newer pandas, Series has a
replace
method to do this more elegantly:The slight difference is that this won't raise if there is a value outside of the dictionary (it'll just stay the same).