I'm trying to find an efficient way to do the following:
I have this sample:
sample = [['no',2, 6], ['ja',5,7], ['no',4,9], ['ja',10,11], ['ap',7,12]]
and would need
res = [['no', 6, 15], ['ja', 15, 18], ['ap',7,12]]
i.e. sum the corresponding values of the sublists where the first element is the same.
Thanks a lot
My code is:
codes = list(set([element[0] for element in sample]))
res=[]
for code in codes:
aux=[code]
res01 = 0
res02 = 0
for element in sample:
if element[0] == code:
res01 += element[1]
res02 += element[2]
aux += [res01, res02]
res.append(aux)
Using defaultdict
:
>>> from collections import defaultdict
>>> d = defaultdict(lambda: [0,0], list())
>>> for a,b,c in sample:
d[a][0]+=b
d[a][1]+=c
#driver values :
IN : sample = [['no',2, 6], ['ja',5,7], ['no',4,9], ['ja',10,11], ['ap',7,12]]
OUT : d = defaultdict(<function <lambda> at 0x7f4349f17620>,
{'no': [6, 15], 'ja': [15, 18], 'ap': [7, 12]})
Since the output is structured as such, I would suggest you utilise the dict
type for storing your output as future processing with it will be easier.
In case you still want the output as a list
, just map the dict
, as follows:
>>> [ [key]+ele for key,ele in d.items()]
=> [['no', 6, 15], ['ja', 15, 18], ['ap', 7, 12]]
import pandas as pd
x=pd.DataFrame(sample).groupby(0).agg({1:"sum", 2:"sum"})
d=x.to_dict(orient="split")
#{'columns': [1, 2], 'data': [[7, 12, 'ap'], [15, 18, 'ja'], [6, 15, 'no']],'index': ['ap', 'ja', 'no']}
[d["data"][i]+[d["index"][i]] for i in range(0, len(d["data"]))]
-----OUTPUT-----------
[[7, 12, 'ap'], [15, 18, 'ja'], [6, 15, 'no']]