I hope I can find help for my question. I am searching for a solution for the following problem:
I have a dataFrame like:
Sp Mt Value count
0 MM1 S1 a **3**
1 MM1 S1 n 2
2 MM1 S3 cb 5
3 MM2 S3 mk **8**
4 MM2 S4 bg **10**
5 MM2 S4 dgd 1
6 MM4 S2 rd 2
7 MM4 S2 cb 2
8 MM4 S2 uyi **7**
My objective is to get the result rows whose count is max between the groups, like :
0 MM1 S1 a **3**
1 3 MM2 S3 mk **8**
4 MM2 S4 bg **10**
8 MM4 S2 uyi **7**
Somebody knows how can I do it in pandas or in python?
UPDATE
I didn't give more details for my question. For my problem, I want to group by ['Sp','Mt']. Let take a second example like this :
Sp Mt Value count
4 MM2 S4 bg 10
5 MM2 S4 dgd 1
6 MM4 S2 rd 2
7 MM4 S2 cb 8
8 MM4 S2 uyi 8
For the above example, I want to get ALL the rows where count equals max in each group e.g :
MM2 S4 bg 10
MM4 S2 cb 8
MM4 S2 uyi 8
To get the indices of the original DF you can do:
Note that if you have multiple max values per group, all will be returned.
Update
On a hail mary chance that this is what the OP is requesting:
Having tried the solution suggested by Zelazny on a relatively large DataFrame (~400k rows) I found it to be very slow. Here is an alternative that I found to run orders of magnitude faster on my data set.
Easy solution would be to apply : idxmax() function to get indices of rows with max values. This would filter out all the rows with max value in the group.
For me, the easiest solution would be keep value when count is equal to the maximum. Therefore, the following one line command is enough :
Use
groupby
andidxmax
methods:transfer col
date
todatetime
:get the index of
max
of columndate
, aftergroupyby ad_id
:get the wanted data:
Out[54]: