I can get a boxplot of a salary column in a pandas DataFrame...
train.boxplot(column='PredictionError',by='Category',sym='')
...however I can't figure out how to define the index-order used on column 'Category' - I want to supply my own custom order, according to another criterion:
category_order_by_mean_salary = train.groupby('Category')['Salary'].mean().order().keys()
How can I apply my custom column order to the boxplot columns? (other than ugly kludging the column names with a prefix to force ordering)
'Category' is a string column taking 27 distinct values: ['Accounting & Finance Jobs','Admin Jobs',...,'Travel Jobs']
. So it can be easily factorized with pd.Categorical.from_array()
On inspection, the limitation is inside pandas.tools.plotting.py:boxplot()
, which converts the column object without allowing ordering:
- pandas.core.frame.py.boxplot() is a passthrough to
- pandas.tools.plotting.py:boxplot() which instantiates ...
- matplotlib.pyplot.py:boxplot() which instantiates ...
- matplotlib.axes.py:boxplot()
I suppose I could either hack up a custom version of pandas boxplot(), or reach into the internals of the object. And also file an enhance request.
EDIT: this question arose with pandas ~0.13 and has probably been obsoleted by recent (0.19+?) versions as per @Cireo's late answer.
Note that pandas can now create categorical columns. If you don't mind having all the columns present in your graph, or trimming them appropriately, you can do something like the below:
http://pandas.pydata.org/pandas-docs/stable/categorical.html
Recent pandas also appears to allow
positions
to pass all the way through from frame to axes.Adding a separate answer, which perhaps could be another question - feedback appreciated.
I wanted to add a custom column order within a groupby, which posed many problems for me. In the end, I had to avoid trying to use
boxplot
from agroupby
object, and instead go through each subplot myself to provide explicit positions.Within my final code, it was even slightly more involved to determine positions because I had multiple data points for each sortby value, and I ended up having to do the below:
Actually I got stuck with the same question. And I solved it by making a map and reset the xticklabels, with code as follows:
Hard to say how to do this without a working example. My first guess would be to just add an integer column with the orders that you want.
A simple, brute-force way would be to add each boxplot one at a time.