How do you create a new Bin/Bucket Variable using pd.qut in python?
This might seem elementary to experienced users but I was not super clear on this and it was surprisingly unintuitive to search for on stack overflow/google. Some thorough searching yielded this (Assignment of qcut as new column) but it didn't quite answer my question because it didn't take the last step and put everything into bins (i.e. 1,2,...).
In Pandas 0.15.0 or newer,
pd.qcut
will return a Series, not a Categorical if the input is a Series (as it is, in your case) or iflabels=False
. If you setlabels=False
, thenqcut
will return a Series with the integer indicators of the bins as values.So to future-proof your code, you could use
or, pass a NumPy array to
pd.qcut
so you get a Categorical as the return value. Note that the Categorical attributelabels
is deprecated. Usecodes
instead:EDIT: The below answer is only valid for versions of Pandas less than 0.15.0. If you are running Pandas 15 or higher, see:
Thanks to @unutbu for pointing it out. :)
Say you have some data that you want to bin, in my case options spreads, and you want to make a new variable with the buckets corresponding to each observation. The link mentioned above that you can do this by:
which gives you what the bin endpoints are that correspond to each observation. However, if you would like the corresponding bin numbers for each observation then you can do this:
Putting it all together if you would like to create a new variable with just the bin numbers, this should suffice:
Hope this helps somebody else. At the very least it should be easier to search for now. :)