I have a list of lists that looks like:
[['chr1', '3088', '1', 744, 'L1MCc_dup1']
['chr1', '3089', '1', 744, 'L1MCc_dup1']
['chr1', '3090', '1', 744, 'L1MCc_dup1']
['chr1', '15037', '1', 96, 'MER63B']
['chr1', '15038', '1', 96, 'MER63B']
['chr1', '15039', '1', 96, 'MER63B']
['chr1', '15040', '1', 96, 'MER63B']
['chr1', '19465', '1', 418, 'MLT2B4_dup1']
['chr1', '19466', '1', 418, 'MLT2B4_dup1']
['chr1', '19467', '1', 418, 'MLT2B4_dup1']]
I need to make the equivalent of a sumifs
function in python (as the file is too big for excel) to sum the contents of column 3 based on the identifier in column 5 (output can be some version of L1MCc_dup1
is 3, MER63B
is 4 and MLT2B4_dup1
is 3).
Any advice/help to make this function?
Sum of all column 4 values (I assume you meant that because it was the only int column), where the last column equals to
'MLT2B4_dup1'
. You can change that to any other condition of course.Use a dictionary:
After this loop,
d
will map the key values in the last column to the desired sums.You could also use
collections.defaultdict
instead of a normal dictionary.