I have an xlsx that I'm parsing with openpyxl
.
Column A is Product Name, Column B is revenue, and I want to extract each pair of prouct-revenue values into a dict
. Were there no duplicate products, it would simply be a matter of the creating a dict by mapping ws.columns
appropriately.
The problem is, there are multiple entries for some (but not all) products. For these, I need to sum the values in question, and just return a single key for those products (as for the rest). So if my revenue spreadsheet contains the following:
I want to sum the values of Revenue for Banana before returning the dict. The desired outcome then is:
{'Banana': 7.2, 'Apple': 1.7, 'Pear': 6.2, 'Kiwi': 1.2}
The following would work OK were there no duplicates:
revenue{}
i = 0;
for product in ws.columns[0]:
revenue[product.value] = ws.columns[1][i].value
i+=1
But obviously it breaks down when it encounters duplicates. I could try using a MultiDict(), which will give a structure from which I can perform the addition and create my final dict
:
d = MultiDict()
for i in range(len(ws.columns[1])):
d.add(ws.columns[0][i].value,ws.columns[1][i].value)
This leaves me with a MultiDict
, which itself is actually a list of tuples, and it all gets a tad convoluted. Is there a neater or standard-library way of achieving the same-key-multiple-times data structure? What about employing zip()
? Doesn't necessarily have to be dict-like. I just need to be able to create a dict
from it (and then perform the addition).
This should be close to what you want, assuming you can transform your data to a list of key-value tuples:
collections.defaultdict
was made for this type of use case.Assuming
length
of second column is less than the first one; one can simply group rows by value in the first column and sum the rest like the following: