Given the following list:
[
('A', '', Decimal('4.0000000000'), 1330, datetime.datetime(2012, 6, 8, 0, 0)),
('B', '', Decimal('31.0000000000'), 1330, datetime.datetime(2012, 6, 4, 0, 0)),
('AA', 'C', Decimal('31.0000000000'), 1330, datetime.datetime(2012, 5, 31, 0, 0)),
('B', '', Decimal('7.0000000000'), 1330, datetime.datetime(2012, 5, 24, 0, 0)),
('A', '', Decimal('21.0000000000'), 1330, datetime.datetime(2012, 5, 14, 0, 0))
]
I would like to group these by the first, second, fourth and fifth columns in the tuple and sum the 3rd. For this example I'll name the columns as col1, col2, col3, col4, col5.
In SQL I would do something like this:
select col1, col2, sum(col3), col4, col5 from my table
group by col1, col2, col4, col5
Is there a "cool" way to do this or is it all a manual loop?
If you find yourself doing this a lot with large datasets, you might want to look at the pandas library, which has lots of nice facilities for doing this kind of thing.
You want
itertools.groupby
.Note that
groupby
expects the input to be sorted, so you may need to do that before hand:(NOTE: output reformatted)