I am trying to bound every value in a dataframe between 0.01 and 0.99
I have successfully normalised the data between 0 and 1 using: .apply(lambda x: (x - x.min()) / (x.max() - x.min()))
as follows:
df = pd.DataFrame({'one' : ['AAL', 'AAL', 'AAPL', 'AAPL'], 'two' : [1, 1, 5, 5], 'three' : [4,4,2,2]})
df[['two', 'three']].apply(lambda x: (x - x.min()) / (x.max() - x.min()))
df
Now I want to bound all values between 0.01 and 0.99
This is what I have tried:
def bound_x(x):
if x == 1:
return x - 0.01
elif x < 0.99:
return x + 0.01
df[['two', 'three']].apply(bound_x)
df
But I receive the following error:
ValueError: ('The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().', u'occurred at index two')
There's an app, err clip method, for that:
yields
The problem with
is that
bound_x
gets passed a Series likedf['two']
and thenif x == 1
requiresx == 1
be evaluated in a boolean context.x == 1
is a boolean Series likePython tries to reduce this Series to a single boolean value,
True
orFalse
. Pandas follows the NumPy convention of raising an error when you try to convert a Series (or array) to a bool.So I had a similar problem where I wanted customized normalization in that I regular percentile of datum or z-score was not adequate. Sometimes I knew what the feasible max and min of the population were, and therefore wanted to define it other than my sample, or a different midpoint, or whatever! So i built a custom function (used extra steps in the code here to make it as readable as possible):
This will take in a pandas series, or even just a list and normalize it to your specified low, center, and high points. also there is a shrink factor! to allow you to scale down the data away from 0 and 1 (I had to do this when combining colormaps in matplotlib:Single pcolormesh with more than one colormap using Matplotlib) So you can likely see how the code works, but basically say you have values [-5,1,10] in a sample, but want to normalize based on a range of -7 to 7 (so anything above 7, our "10" is treated as a 7 effectively) with a midpoint of 2, but shrink it to fit a 256 RGB colormap:
It can also turn your data inside out... this may seem odd, but I found it useful for heatmapping. Say you want a darker color for values closer to 0 rather than hi/low. You could heatmap based on normalized data where insideout=True:
So now "2" which is closest to the center, defined as "1" is the highest value.
Anyways, I thought my issue was very similar to yours and this function could be useful to you.