I have the following df
>In [260]: df
>Out[260]:
size market vegetable confirm availability
0 Large ABC Tomato NaN
1 Large XYZ Tomato NaN
2 Small ABC Tomato NaN
3 Large ABC Onion NaN
4 Small ABC Onion NaN
5 Small XYZ Onion NaN
6 Small XYZ Onion NaN
7 Small XYZ Cabbage NaN
8 Large XYZ Cabbage NaN
9 Small ABC Cabbage NaN
1) How to get the size of a vegetable whose size count is maximum?
I used groupby on vegetable and size to get the following df But I need to get the rows which contain the maximum count of size with vegetable
In [262]: df.groupby(['vegetable','size']).count()
Out[262]: market confirm availability
vegetable size
Cabbage Large 1 0
Small 2 0
Onion Large 1 0
Small 3 0
Tomato Large 2 0
Small 1 0
df2['vegetable','size'] = df.groupby(['vegetable','size']).count().apply( some logic )
Required Df :
vegetable size max_count
0 Cabbage Small 2
1 Onion Small 3
2 Tomato Large 2
2) Now I can say 'Small Cabbages' are available in huge quantity from df. So I need to populate the confirm availability column with small for all cabbage rows How to do this?
size market vegetable confirm availability
0 Large ABC Tomato Large
1 Large XYZ Tomato Large
2 Small ABC Tomato Large
3 Large ABC Onion Small
4 Small ABC Onion Small
5 Small XYZ Onion Small
6 Small XYZ Onion Small
7 Small XYZ Cabbage Small
8 Large XYZ Cabbage Small
9 Small ABC Cabbage Small
You can assign the grouped dataframe to another object, then you can do other grouping on index of 'Vegetable' to get the maximum required value
Out:
You can
GroupBy
withcount
, then sort and drop duplicates:1)
2)