This question already has an answer here:
- How to select the first row of each group? 8 answers
I have the following DataFrame df
:
How can I delete duplicates, while keeping the minimum value of level
per each duplicated pair of item_id
and country_id
.
+-----------+----------+---------------+
|item_id |country_id|level |
+-----------+----------+---------------+
| 312330| 13535670| 82|
| 312330| 13535670| 369|
| 312330| 13535670| 376|
| 319840| 69731210| 127|
| 319840| 69730600| 526|
| 311480| 69628930| 150|
| 311480| 69628930| 138|
| 311480| 69628930| 405|
+-----------+----------+---------------+
The expected output:
+-----------+----------+---------------+
|item_id |country_id|level |
+-----------+----------+---------------+
| 312330| 13535670| 82|
| 319840| 69731210| 127|
| 319840| 69730600| 526|
| 311480| 69628930| 138|
+-----------+----------+---------------+
I know how to delete duplicates without conditions using dropDuplicates
, but I don't know how to do it for my particular case.