How to get percentage of counts of a column after

2019-05-05 17:40发布

I'm trying to get the distribution of grades for each rank for names in a list of data. However, I can't figure out how to get the proportion/percentage of each grade count over its rank group. Here's an example:

df.head()

name    rank    grade
Bob     1       A
Bob     1       A
Bob     1       B
Bob     1       C
Bob     2       B
Bob     3       C
Joe     1       C
Joe     2       B
Joe     2       B
Joe     3       A
Joe     3       B
Joe     3       B

I use grade_count = df.groupby(['name', 'rank', 'grade']).['grade'].size()) to give me the count of each grade within its (name,rank) group:

name    rank    grade
Bob     1       A     2
                B     1
                C     1
        2       B     1
        3       C     1
Joe     1       C     1
        2       B     2
        3       A     1
                B     2

Now for each size calculated, I'd like to get its proportion to the (name,rank) group (i.e. what is the proportion of a grade within a rank, within a system) This is the output I'd like:

name    rank    grade
Bob     1       A     2    0.5   (Bob @ rank 1 had 4 grades, and 50% of them are A's)
                B     1    0.25
                C     1    0.25
        2       B     1    1
        3       C     1    1
Joe     1       C     1    1
        2       B     2    1
        3       A     1    0.33
                B     2    0.66

I've managed to get the totals of each rank group by using rank_totals = grade_count.groupby(level[0,1]).sum() which results in:

name    rank    
Bob     1       4
        2       1
        3       1
Joe     1       1
        2       2
        3       3

How can I divide the numbers from grade_count by their corresponding rank totals in rank_totals?

标签： python pandas group-by aggregate percentage

1条回答

虎瘦雄心在

2楼-- · 2019-05-05 18:31

Group your data by name and rank levels, and use transform to get the total of your series and broadcast it to the entire Series. Use that series to divide the current one:

grade_count.groupby(level = [0,1]).transform(sum)
Out[19]: 
name  rank  grade
Bob   1     A        4
            B        4
            C        4
      2     B        1
      3     C        1
Joe   1     C        1
      2     B        2
      3     A        3
            B        3
dtype: int64

grade_count / grade_count.groupby(level = [0,1]).transform(sum)
Out[20]: 
name  rank  grade
Bob   1     A        0.500000
            B        0.250000
            C        0.250000
      2     B        1.000000
      3     C        1.000000
Joe   1     C        1.000000
      2     B        1.000000
      3     A        0.333333
            B        0.666667

0人赞添加讨论(0) 举报

How to get percentage of counts of a column after

采纳回答

编辑标签

举报内容

检举类型

检举原因

检举说明(必填)

打开微信“扫一扫”，打开网页后点击屏幕右上角分享按钮

付费偷看金额在0.1-10元之间