I have huge data sets which contains more than millions of rows and has some peculiar attributes. I need to filter the data retaining its other properties.
My data is as like following:
ID Prop1 Prop2 TotalProp
56891940 G02 G02 2
56892558 A61 G02 4
56892558 A61 A61 4
56892558 G02 A61 4
56892558 A61 A61 4
56892552 B61 B61 3
56892552 B61 B61 3
56892552 B61 A61 3
56892559 B61 G61 3
56892559 B61 B61 3
56892559 B61 B61 3 and so on more than million rows
What I want is, I need to remove rows if all rows ID having 56891940 and 56892559 which have "prop1" and "prop2" same but not 56892558 and 56892559 because some rows are same but at least one of its properties are different so I want to retain all values from 56892558,56892552 and 56892559 and so on.
My final output should look like:
ID Prop1 Prop2 TotalProp
56892558 A61 G02 4
56892558 A61 A61 4
56892558 G02 A61 4
56892558 A61 A61 4
56892552 B61 B61 3
56892552 B61 B61 3
56892552 B61 A61 3
56892559 B61 G61 3
56892559 B61 C61 3
56892559 B61 B61 3