Python Pandas - Quantile calculation manually

2020-07-22 10:06发布

问题:

I am trying to calculate quantile for a column values manually, but not able to find the correct quantile value manually using the formula when compared to result output from Pandas. I looked around for different solutions, but did not find the right answer

In [54]: df

Out[54]:
    data1   data2       key1    key2
0 -0.204708 1.393406    a       one
1 0.478943  0.092908    a       two
2 1.965781  1.246435    a       one

In [55]: grouped = df.groupby('key1')
In [56]: grouped['data1'].quantile(0.9)
Out[56]:
key1
a 1.668413

using the formula to find it manually,n is 3 as there are 3 values in data1 column

quantile(n+1)

applying the values of df1 column

=0.9(n+1) 
=0.9(4)
= 3.6

so 3.6th position is 1.965781, so how does pandas gives 1.668413 ?

回答1:

The function quantile will assign percentages based on the range of your data.

In your case:

  • -0.204708 would be considered the 0th percentile,
  • 0.478943 would be considered the 50th percentile and
  • 1.965781 would be considered the 100th percentile.

So you could calculate the 90th percentile the following way (using linear interpolation between the 50th and 100th percentile:

>>import numpy as np

>>x =np.array([-0.204708,1.965781,0.478943])
>>ninetieth_percentile = (x[1] - x[2])/0.5*0.4+x[2]
>>ninetieth_percentile    
1.6684133999999999

Note the values 0.5 and 0.4 come from the fact that two points of your data span 50% of the data and 0.4 represents the amount above the 50% you wish to find (0.5+0.4 = 0.9). Hope this makes sense.